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Preface 


This volume contains the basics of Zermelo-Fraenkel axiomatic set theory. It is 
situated between two opposite poles: On one hand there are elementary texts that 
familiarize the reader with the vocabulary of set theory and build set-theoretic 
tools for use in courses in analysis, topology, or algebra — but do not get into 
metamathematical issues. On the other hand are those texts that explore issues 
of current research interest, developing and applying tools (constructibility, 
absoluteness, forcing, etc.) that are aimed to analyze the inability of the axioms 
to settle certain set-theoretic questions. 

Much of this volume just “does set theory”, thoroughly developing the theory 
of ordinals and cardinals along with their arithmetic, incorporating a careful dis- 
cussion of diagonalization and a thorough exposition of induction and inductive 
(recursive) definitions. Thus it serves well those who simply want tools to ap- 
ply to other branches of mathematics or mathematical sciences in general (e.g., 
theoretical computer science), but also want to find out about some of the subtler 
results of modern set theory. 

Moreover, a fair amount is included towards preparing the advanced reader 
to read the research literature. For example, we pay two visits to Gédel’s con- 
structible universe, the second of which concludes with a proof of the relative 
consistency of the axiom of choice and of the generalized continuum hypothesis 
with ZF. As such a program requires, I also include a thorough discussion of 
formal interpretations and absoluteness. The lectures conclude with a short but 
detailed study of Cohen forcing and a proof of the non-provability in ZF of the 
continuum hypothesis. 

The level of exposition is designed to fit a spectrum of mathematical sophis- 
tication, from third-year undergraduate to junior graduate level (each group will 
find here its favourite chapters or sections that serve its interests and level of 
preparation). 


Xi 


Xii Preface 


The volume is self-contained. Whatever tools one needs from mathematical 
logic have been included in Chapter I. Thus, a reader equipped with a com- 
bination of sufficient mathematical maturity and patience should be able to 
read it and understand it. There is a trade-off: the less the maturity at hand, the 
more the supply of patience must be. To pinpoint this “maturity”: At least two 
courses from among calculus, linear algebra, and discrete mathematics at the 
junior level should have exposed the reader to sufficient diversity of mathemat- 
ical issues and proof culture to enable him or her to proceed with reasonable 
ease. 


A word on approach. I use the Zermelo-Fraenkel axiom system with the axiom 
of choice (AC). This is the system known as ZFC. As many other authors do, I 
simplify nomenclature by allowing “proper classes” in our discussions as part 
of our metalanguage, but not in the formal language. 

I said earlier that this volume contains the “basics”. I mean this charac- 
terisation in two ways: One, that all the fundamental tools of set theory as needed 
elsewhere in the mathematical sciences are included in detailed exposition. Two, 
that I do not present any applications of set theory to other parts of mathematics, 
because space considerations, along with a decision to include certain advanced 
relative consistency results, have prohibited this. 

“Basics” also entails that I do not attempt to bring the reader up to speed 
with respect to current research issues. However, a reader who has mastered 
the advanced metamathematical tools contained here will be able to read the 
literature on such issues. 

The title of the book reflects two things: One, that all good short titles are 
taken. Two, more importantly, it advertises my conscious effort to present the 
material in a conversational, user-friendly lecture style. I deliberately employ 
classroom mannerisms (such as “pauses” and parenthetical “why’’s, “what if’’s, 
and attention-grabbing devices for passages that I feel are important). This 
aims at creating a friendly atmosphere for the reader, especially one who has 
decided to study the topic without the guidance of an instructor. Friendliness 
also means steering clear of the terse axiom-definition-theorem recipe, and 
explaining how some concepts were arrived at in their present form. In other 
words, what makes things tick. Thus, I approach the development of the key 
concepts of ordinals and cardinals, initially and tentatively, in the manner they 
were originally introduced by Georg Cantor (paradox-laden and all). Not only 
does this afford the reader an understanding of why the modern (von Neumann) 
approach is superior (and contradiction-free), but it also shows what it tries to 
accomplish. In the same vein, Russell’s paradox is visited no less than three 
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times, leaving us in the end with a firm understanding that it has nothing to do 
with the “truth” or otherwise of the much-maligned statement “x € x” but it is 
just the result of a diagonalization of the type Cantor originally taught us. 


A word on coverage. Chapter I is our “Chapter 0”. It contains the tools needed 
to enable us do our job properly — a bit of mathematical logic, certainly no more 
than necessary. Chapter IT informally outlines what we are about to describe 
axiomatically: the universe of all the “real” sets and other “objects” of our 
intuition, a caricature of the von Neumann “universe’’. It is explained that the 
whole fuss about axiomatic set theory! is to have a formal theory derive true 
statements about the von Neumann sets, thus enabling us to get to know the 
nature and structure of this universe. If this is to succeed, the chosen axioms 
must be seen to be “true” in the universe we are describing. 

To this end I ensure via informal discussions that every axiom that is intro- 
duced is seen to “follow” from the principle of the formation of sets by stages, or 
from some similarly plausible principle devised to keep paradoxes away. In this 
manner the reader is constantly made aware that we are building a meaningful 
set theory that has relevance to mathematical intuition and expectations (the 
“real” mathematics), and is not just an artificial choice of a contradiction-free 
set of axioms followed by the mechanical derivation of a few theorems. 

With this in mind, I even make a case for the plausibility of the axiom of 
choice, based on a popularization of Gédel’s constructible universe argument. 
This occurs in Chapter IV and is informal. 


The set theory we do allows atoms (or Urelemente),* just like Zermelo’s. 
The re-emergence of atoms has been defended aptly by Jon Barwise (1975) and 
others on technical merit, especially when one does “restricted set theories” 
(e.g., theory of admissible sets). 

Our own motivation is not technical; rather it is philosophical and ped- 
agogical. We find it extremely counterintuitive, especially when addressing 
undergraduate audiences, to tell them that all their familiar mathematical 
objects — the “stuff of mathematics” in Barwise’s words — are just perverse 
“box-in-a-box-in-a-box ...”” formations built from an infinite supply of empty 
boxes. For example, should I be telling my undergraduate students that their 
familiar number “2” really is just a short name for something like “ ey 
And what will I tell them about “/2”? 


+ O.K., maybe not the whole fuss. Axiomatics also allow us to meaningfully ask, and attempt to 
answer, metamathematical questions of derivability, consistency, relative consistency, indepen- 
dence. But in this volume much of the fuss is indeed about learning set theory. 

Allows, but does not insist that there are any. 
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Some mathematicians have said that set theory (without atoms) speaks only 
of sets and it chooses not to speak about objects such as cows or fish (colourful 
terms for urelements). Well, it does too! Such (“atomless’’) set theory is known 
to be perfectly capable of constructing “artificial” cows and fish, and can then 
proceed to talk about such animals as much as it pleases. 

While atomless ZFC has the ability to construct or codify all the familiar 
mathematical objects in it, it does this so well that it betrays the prime directive 
of the axiomatic method, which is to have a theory that applies to diverse 
concrete (meta — i.e., outside the theory and in the realm of “everyday math’) 
mathematical systems. Group theory and projective geometry, for example, 
fulfill the directive. 

In atomless ZFC the opposite appears to be happening: One is asked to 
embed the known mathematics into the formal system. 

We prefer a set theory that allows both artificial and real cows and fish, so that 
when we want to illustrate a point in an example utilizing, say, the everyday set 
of integers, Z, we can say things like “let the atoms (be interpreted to) include 
the members of Z...”. 

But how about technical convenience? Is it not hard to include atoms in a 
formal set theory? In fact, not at all! 


A word on exposition devices. I freely use a pedagogical feature that, I believe, 
originated in Bourbaki’s books — that is, marking an important or difficult topic 
by placing a “winding road” sign in the margin next to it. I am using here the 
same symbol that Knuth employed in his TEXbook, namely, @, marking with 
it the beginning and end of an important passage. 


Topics that are advanced, or of the “read at your own risk” type, can be 
omitted without loss of continuity. They are delimited by a double sign, @. 


Most chapters end with several exercises. I have stopped making attempts to 
sort exercises between “hard” and “just about right’, as such classifications are 
rather subjective. In the end, I'll pass on to you the advice one of my professors 
at the University of Toronto used to offer: “Attempt all the problems. Those you 
can do, don’t do. Do the ones you cannot”. 


What to read. Just as in the advice above, I suggest that you read everything 
that you do not already know if time is no object. In a class environment the 
coverage will depend on class length and level, and I defer to the preferences of 
the instructor. I suppose that a fourth-year undergraduate audience ought to see 
the informal construction of the constructible universe in Chapter IV, whereas 
a graduate audience would rather want to see the formal version in Chapter VI. 
The latter group will probably also want to be exposed to Cohen forcing. 
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A Bit of Logic: A User’s Toolbox 


This prerequisite chapter — what some authors call a “Chapter 0” —is an abridged 
version of Chapter I of volume | of my Lectures in Logic and Set Theory. It is of- 
fered here just in case that volume Mathematical Logic is not readily accessible. 

Simply put, logic! is about proofs or deductions. From the point of view of 
the user of the subject — whose best interests we attempt to serve in this chapter — 
logic ought to be just a toolbox which one can employ to prove theorems, for 
example, in set theory, algebra, topology, theoretical computer science, etc. 

The volume at hand is about an important specimen of a mathematical theory, 
or logical theory, namely, axiomatic set theory. Another significant example, 
which we do not study here, is arithmetic. Roughly speaking, a mathematical 
theory consists on one hand of assumptions that are specific to the subject 
matter — the so-called axioms — and on the other hand a toolbox of logical rules. 
One usually performs either of the following two activities with a mathematical 
theory: One may choose to work within the theory, that is, employ the tools and 
the axioms for the sole purpose of proving theorems. Or one can take the entire 
theory as an object of study and study it “from the outside” as it were, in order to 
pose and attempt to answer questions about the power of the theory (e.g., “does 
the theory have as theorems all the ‘true’ statements about the subject matter?”’), 
its reliability (meaning whether it is free from contradictions or not), how its 
reliability is affected if you add new assumptions (axioms), etc. 


Our development of set theory will involve both types of investigations indi- 
cated above: 


(1) Primarily, we will act as users of logic in order to deduce “true” state- 
ments about sets (i.e., theorems of set theory) as consequences of certain 


1 We drop the qualifier “mathematical” from now on, as this is the only type of logic we are about. 
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“obviously true’! statements that we accept up front without proof, namely, 
the ZFC axioms.‘ This is pretty much analogous to the behaviour of a 
geometer whose job is to prove theorems of, say, Euclidean geometry. 

(2) We will also look at ZFC from the outside and address some issues of the 
type “is such and such a sentence (of set theory) provable from the axioms 
of ZFC and the rules of logic alone?” 


It is evident that we need a precise formulation of set theory, that is, we must 
turn it into a mathematical object in order to make task (2), above, a meaningful 
mathematical activity.’ This dictates that we develop logic itself formally, and 
subsequently set theory as a formal theory. 

Formalism,‘ roughly speaking, is the abstraction of the reasoning processes 
(proofs) achieved by deleting any references to the “truth content” of the com- 
ponent mathematical statements (formulas). What is important in formalist 
reasoning is solely the syntactic form of (mathematical) statements as well as 
that of the proofs (or deductions) within which these statements appear. 

A formalist builds an artificial language, that is, an infinite — but finitely 
specifiable* — collection of “words” (meaning symbol sequences, also called 
expressions). Hell then uses this language in order to build deductions — that 
is, finite sequences of words — in such a manner that, at each step, he writes 
down a word if and only if it is “certified” to be syntactically correct to do so. 
“Certification” is granted by a toolbox consisting of the very same rules of logic 
that we will present in this chapter. 

The formalist may pretend, if he so chooses, that the words that appear in a 
proof are meaningless sequences of meaningless symbols. Nevertheless, such 
posturing cannot hide the fact that (in any purposefully designed theory) these 


— 


We often quote a word or cluster of related words as a warning that the crude English meaning 
is not necessarily the intended meaning, or it may be ambiguous. For example, the first “true” 
in the sentence where this footnote originates is technical, but in a first approximation may be 
taken to mean what “true” means in English. “Obviously true” is an ambiguous term. Obvious to 
whom? However, the point is — to introduce another ambiguity — that “reasonable people” will 
accept the truth of the (ZFC) axioms. 

This is an acronym reflecting the names of Zermelo and Fraenkel — the founders of this particular 
axiomatization — and the fact that the so-called axiom of choice is included. 

Here is an analogy: It is the precision of the rules for the game of chess that makes the notion of 
analyzing a chessboard configuration meaningful. 

The person who practises formalism is a formalist. 

The finite specification is achieved by a finite collection of “rules”, repeated applications of which 
build the words. 

| By definition, “he”, “his”, “him” — and their derivatives — are gender-neutral in this volume. 


a 


wo 


+ 


I. A Bit of Logic: A User’s Toolbox 3 


words codify “true” (intuitively speaking) statements. Put bluntly, we must have 
something meaningful to talk about before we bother to codify it. 

Therefore, a formal theory is a laboratory version (artificial replica or sim- 
ulation, if you will) of a “real” mathematical theory of the type encountered 
in mathematics,' and formal proofs do unravel (codified versions of) “truths” 
beyond those embodied in the adopted axioms. 


It will be reassuring for the uninitiated that it is a fact of logic that the to- 
tality of the “universally true” statements — that is, those that hold in all of 
mathematics and not only in specific theories — coincides with the totality 
of statements that we can deduce purely formally from some simple univer- 
sally true assumptions such as x = x, without any reference to meaning or 
“truth” (Gédel’s completeness theorem for first order logic). In short, in this 
case formal deducibility is as powerful as “truth”. The flip side is that formal 
deducibility cannot be as powerful as “truth” when it is applied to specific 
mathematical theories such as set theory or arithmetic (Gédel’s incompleteness 
theorem). 


Formalization allows us to understand the deeper reasons that have pre- 
vented set theorists from settling important questions such as the continuum 
hypothesis — that is, the statement that there are no cardinalities between that of 
the set of natural numbers and that of the set of the reals. This understanding is 
gathered by “running diagnostics” on our laboratory replica of set theory. That 
is, just as an engineer evaluates a new airplane design by building and testing 
a model of the real thing, we can find out, with some startling successes, what 
are the limitations of our theory, that is, what our assumptions are incapable of 
logically implying.* If the replica is well built,’ we can then learn something 
about the behaviour of the real thing. 

In the case of formal set theory and, for example, the question of our failure 
to resolve the continuum hypothesis, such diagnostics (the methods of Gédel 
and Cohen — see Chapters VI and VIII) return a simple answer: We have not 
included enough assumptions in (whether “real” or “formal’) set theory to settle 
this question one way or another. 


+ Examples of “real” (non-formalized) theories are Euclid’s geometry, topology, the theory of 
groups, and, of course, Cantor’s “naive” or “informal” set theory. 

= In model theory “model” means exactly the opposite of what it means here. A model airplane 
abstracts the real thing. A model of a formal (i.e., abstract) theory is a “concrete” or “real” version 
of the abstract theory. 

8 This is where it pays to choose reasonable assumptions, assumptions that are “obviously true”. 
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But what about the interests of the reader who only wants to practise set 
theory, and who therefore may choose to skip the parts of this volume that just 
talk about set theory? Does, perchance, formalism put him into an unnecessary 
straitjacket? 

We think not. Actually it is easier, and safer, to reason formally than to do so 
informally. The latter mode often mixes syntax and semantics (meaning), and 
there is always the danger that the “user” may assign incorrect (i.e., convenient, 
but not general) meanings to the symbols that he manipulates, a phenomenon 
that anyone who is teaching mathematics must have observed several times 
with some distress. 

Another uncertainty one may encounter in an informal approach is this: 
“What can we allow to be a ‘property’ in mathematics?” This is an important 
question, for we often want to collect objects that share a common property, 
or we want to prove some property of the natural numbers by induction or by 
the least principle. But what is a property? Is colour a property? How about 
mood? It is not enough to say, “no, these are not properties”, for these are 
just two frivolous examples. The question is how to describe accurately and 
unambiguously the infinite variety of properties that are allowed. Formalism 
can do just that.' 

“Formalism for the user” is not a revolutionary slogan. It was advocated 
by Hilbert, the founder of formalism, partly as a means of — as he believed? — 
formulating mathematical theories in a manner that allows one to check them 
(i.e., run diagnostic tests on them) for freedom from contradiction,’ but also as 
the right way to do mathematics. By this proposal he hoped to salvage mathe- 
matics itself — which, Hilbert felt, was about to be destroyed by the Brouwer 
school of intuitionist thought. In a way, his program could bridge the gap 
between the classical and the intuitionist camps, and there is some evidence 
that Heyting (an influential intuitionist and contemporary of Hilbert) thought 
that such a rapprochement was possible. After all, since meaning is irrelevant 
to a formalist, all that he is doing (in a proof) is shuffling finite sequences of 


¥ Well, almost. So-called cardinality considerations make it impossible to describe all “good” 
properties formally. But, practically and empirically speaking, we can define all that matter for 
“doing mathematics”. 

t This belief was unfounded, as Gédel’s incompleteness theorems showed. 

8 Hilbert’s metatheory — that is, the “world” or “lab” outside the theory, where the replica is 
actually manufactured — was finitary. Thus — Hilbert believed — all this theory building and 
theory checking ought to be effected by finitary means. This was another ingredient that was 
consistent with peaceful coexistence with the intuitionists. And, alas, this ingredient was the one 
that — as some writers put it — destroyed Hilbert’s program to found mathematics on his version 
of formalism. Gédel’s incompleteness theorems showed that a finitary metatheory is not up to 
the task. 
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symbols, never having to handle or argue about infinite objects — a good thing, 
as far as an intuitionist is concerned.! 

In support of the “formalism for the user” position we must not fail to 
mention Bourbaki’s (1966a) monumental work, which is a formalization of a 
huge chunk of mathematics, including set theory, algebra, topology, and theory 
of integration. This work is strictly for the user of mathematics, not for the 
metamathematician who studies formal theories. Yet, it is fully formalized, 
true to the spirit of Hilbert, and it comes in a self-contained package, including 
a “Chapter 0” on formal logic. 

More recently, the proposition of employing formal reasoning as a tool has 
been gaining support in a number of computer science undergraduate curricula, 
where logic and discrete mathematics are taught in a formalized setting, starting 
with a rigorous course in the two logical calculi (propositional and predicate), 
emphasizing the point of view of the user of logic (and mathematics) — hence 
with an attendant emphasis on calculating (i.e., writing and annotating formal) 
proofs. Pioneering works in this domain are the undergraduate text (1994) and 
the paper (1995) of Gries and Schneider. 

You are urged to master the technique of writing formal proofs by studying 
how we go about it throughout this volume, especially in Chapter III.* You will 
find that writing and annotating formal proofs is a discipline very much like 
computer programming, so it cannot be that hard. Computer programming is 
taught in the first year, isn’t it?$ 


¥ True, a formalist applies classical logic, while an intuitionist applies a different logic where, for 
example, double negation is not removable. Yet, unlike a Platonist, a formalist does not believe — 
or he does not have to disclose to his intuitionist friends that he might do — that infinite sets exist 
in the metatheory, as his tools are just finite symbol sequences. To appreciate the tension here, 
consider this anecdote: It is said that when Kronecker — the father of intuitionism — was informed 
of Lindemann’s proof (1882) that z is transcendental, while he granted that this was an interesting 
result, he also dismissed it, suggesting that 77 — whose decimal expansion is, of course, infinite 
but not periodic — “does not exist” (see Wilder (1963, p. 193)). We do not propound the tenets of 
intuitionism here, but it is fair to state that infinite sets are possible in intuitionistic mathematics 
as this has later evolved in the hands of Brouwer and his Amsterdam school. However, such 
sets must be (like all sets of intuitionistic mathematics) finitely generated — just like our formal 
languages and the set of theorems (the latter provided that our axioms are too) — in a sense that 
may be familiar to some readers who have had a course in automata and language theory. See 
Wilder (1963, p. 234). 

Many additional paradigms of formal proofs, in the context of arithmetic, are found in Chapter IT 
of volume 1 of these Lectures. 

One must not gather the impression that formal proofs are just obscure sequences of symbol 
sequences akin to Morse code. Just as one does in computer programming, one also uses comments 
in formal proofs — that is, annotations (in English, Greek, or your favourite natural language) 
that aim to explain or justify for the benefit of the reader the various proof steps. At some point, 
when familiarity allows and the length of (formal) proofs becomes prohibitive, we agree to relax 
the proof style. Read on! 
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It is also fair to admit, in defense of “semantic reasoning”, that meaning is 
an important tool for formulating conjectures, for analyzing a given proof in 
order to figure out what makes it tick, or indeed for discovering the proof, in 
rough outline, in the first place. For these very reasons we supplement many of 
our formal arguments in this volume with discussions that are based on intuitive 
semantics, and with several examples taken from informal mathematics. 


We forewarn the reader of the inevitability with which the informal language 
of sets already intrudes in this chapter (as it indeed does in all mathematics). 
More importantly, some of the elementary results of Cantorian naive set theory 
are needed here. Conversely, formal set theory needs the tools and some of the 
results developed here. This apparent “chicken or egg” phenomenon is often 
called “bootstrapping”, not to be confused with “circularity” — which it is not: 
Only informal set theory notation and results are needed here in order to found 
formal set theory. 


ems is a good place to summarize our grand plan: 


First (in this chapter), we will formalize the rules of reasoning in general — as 
these apply to all mathematics — and develop their properties. We will skip the 
detailed study of the interaction between formalized rules and their intended 
meaning (semantics), as well as the study of the limitations of these formalized 
rules. Nevertheless, we will state without proof the relevant important results 
that come into play here, the completeness and incompleteness theorems (both 
due to Kurt Gédel). 


Secondly (starting with the next chapter), once we have learnt about these 
tools of formalized reasoning — what they are and how to use them — we will 
next become users of formal logic so that we can discover important theorems 
of (or, as we say, develop) set theory. Of course, we will not forget to run a few 
diagnostics. For example, Chapter VIII is entirely on metamathematical issues. 


Formal theories, and their artificial languages, are defined (built) and “tested” 
within informal mathematics (the latter also called “real” mathematics by 
Platonists). The first theory that we build here is general-purpose, or “pure’’, 
formal logic. We can then build mathematical formal theories (e.g., set theory) 
by just adding “impurities”, namely, the appropriate special symbols and ap- 
propriate special assumptions (written in the artificial formal language). 

We describe precisely how we construct these languages and theories using 
the usual abundance of mathematical notation, notions, and techniques available 


1 The term “bootstrapping” is suggestive of a person pulling himself up by his bootstraps. Reput- 
edly, this technique, which is pervasive, among others, in the computer programming field — as 
alluded to in the term “booting” — was invented by Baron Miinchhausen. 
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to us, augmented by the descriptive power of natural language (e.g., English, 
or Greek, or French, or German, or Russian), as particular circumstances or 
geography might dictate. This milieu within which we build, pursue, and study 
our theories — besides “real mathematics” — is also often called the metatheory, 
or more generally, metamathematics. The language we speak while at it, this 
mélange of mathematics and natural language, is the metalanguage. © 


1.1. First Order Languages 


In the most abstract and thus simplest manner of describing it, a formalized 
mathematical theory (also, formalized logical theory) consists of the following 
sets of things: a set of basic or primitive symbols, 7, used to build symbol 
sequences (also called strings, or expressions, or words, over 7); a set of 
strings, Wff, over 7’, called the formulas of the theory; and finally, a subset of 
Wff, Thm, the set of theorems of the theory.‘ 

Well, this is the extension of a theory, that is, the explicit set of objects in it. 
How is a theory given? 

In most cases of interest to the mathematician it is given by specifying 7 and 
two sets of simple rules, namely, formula-building rules and theorem-building 
rules. Rules from the first set allow us to build, or generate, Wff from 7. 
The rules of the second set generate Thm from Wff. In short (e.g., Bourbaki 
(1966b)), a theory consists of an alphabet of primitive symbols and rules used 
to generate the “language of the theory” (meaning, essentially, Wff) from these 
symbols, and some additional rules used to generate the theorems. We expand 
on this below. 


1.1.1 Remark. What is a rule? We run the danger of becoming circular or too 
pedantic if we overdefine this notion. Intuitively, the rules we have in mind 
are string manipulation rules — that is, “black boxes” (or functions) that re- 
ceive string inputs and respond with string outputs. For example, a well-known 
theorem-building rule receives as input a formula and a variable, and it returns 
(essentially) the string composed of the symbol V, immediately followed by the 
variable and, in turn, immediately followed by the formula.t © 


(1) First off, the (first order) formal language, L, where the theory is “spoken” 
is a triple (7, Term, Wff), that is, it has three important components, each 
of them a set. Y is the alphabet (or vocabulary) of the language. It is the 


} For a less abstract, but more detailed view of theories see p. 39. 
} This rule is usually called “generalization”. 
8. We will soon say what makes a language “first order”. 
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collection of the basic syntactic “bricks” (symbols) that we use to form 
symbol sequences (or expressions) that are terms (members of Term) or 
formulas (members of Wff). We will ensure that the processes that build 
terms or formulas, using the basic building blocks in ”,, are (intuitively) 
algorithmic (“mechanical”). Terms will formally codify objects, while for- 
mulas will formally codify statements about objects. 

(2) Reasoning in the theory will be the process of discovering “true statements” 
about objects — that is, theorems. This discovery journey begins with cer- 
tain formulas which codify statements that we take for granted (i.e., accept 
without proof as “basic truths”). Such formulas are the axioms. There are 
two types of axioms. Special, or nonlogical, axioms are to describe specific 
aspects of any theory that we might be building; they are “basic truths” 
in a restricted context. For example, “x + 1 4 0” is a special axiom that 
contributes towards the characterization of number theory over N. This is a 
“basic truth” in the context of N but is certainly not true of the integers or the 
rationals — which is good, because we do not want to confuse N with the in- 
tegers or the rationals. The other kind of axiom will be found in ail theories. 
It is the kind that is “universally valid’, that is, not a theory-specific truth 
but one that holds in all branches of mathematics (for example, “x = x” is 
such a universal truth). This is why this type of axiom will be called logical. 

(3) Finally, we will need rules for reasoning, actually called rules of inference. 
These are rules that allow us to deduce, or derive, a true statement from other 
statements that we have already established as being true. These rules will 
be chosen to be oblivious to meaning, being only conscious of form. They 
will apply to statement configurations of certain recognizable forms and 
will produce (derive) new statements of some corresponding recognizable 
forms (see Remark I.1.1). 


1.1.2 Remark. We may think of axioms (of either logical or nonlogical type) as 
being special cases of rules, that is, rules that receive no input in order to produce 
an output. In this manner item (2) above is subsumed by item (3), thus we are 
faithful to our abstract definition of theory (where axioms were not mentioned). 

An example, outside mathematics, of an inputless rule is the rule invoked 
when you type date on your computer keyboard. This rule receives no input, 
and outputs the current date on your screen. 


We next look carefully into (first order) formal languages. 


1 The generous use of the term “true” here is only meant to motivate. “Provable” or “deducible” 
formula, or “theorem”, will be the technically precise terminology that we will soon define to 
replace the term “true statement”. 
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There are two parts in each first order alphabet. The first, the collection of 
the logical symbols, is common to all first order languages (regardless of which 
theory is spoken in them). We describe this part immediately below. 


Logical Symbols. 


LS.1. Object or individual variables. An object variable is any one symbol out 
of the unending sequence vo, v1, v2,.... In practice — whether we are 
using logic as a tool or as an object of study — we agree to be sloppy with 
notation and use, generically, x, y, z, u, v, w with or without subscripts 
or primes as names of object variables.' This is just a matter of nota- 
tional convenience. We allow ourselves to write, say, z instead of, say, 
V120000000000560000009 - Object variables (intuitively) “vary over” (i.e., are 
allowed to take values that are) the objects that the theory studies (e.g., 
numbers, sets, atoms, lines, points, etc., as the case may be). 

LS.2. The Boolean or propositional connectives. These are the symbols “—” 
and “v”.? These are pronounced not and or respectively. 

LS.3. The existential quantifier, that is, the symbol “3”, pronounced exists or 
for some. 

LS.4. Brackets, that is, “(’ and “)”. 

LS.5. The equality predicate. This is the symbol “=”, which we use to indicate 
that objects are “equal”. It is pronounced equals. 


The logical symbols will have a fixed interpretation. In particular, “=” will 
always be expected to mean equals. 


The theory-specific part of the alphabet is not fixed, but varies from theory 
to theory. For example, in set theory we just add the nonlogical (or special) 
symbols, € and U. The first is a special predicate symbol (or just predicate) of 
arity 2; the second is a predicate symbol of arity 1. 

In number theory we adopt instead the special symbols S (intended meaning: 
successor, or “+ 1”, function), +, x, 0, <, and (sometimes) a symbol for the 


— 


Conventions such as this one are essentially agreements — effected in the metatheory — on how to 
be sloppy and get away with it. They are offered in the interest of user-friendliness and readability. 
There are also theory-specific conventions, which may allow additional names in our informal 
(metamathematical) notation. Such examples, in set theory, occur in the following chapters. 
The quotes are not part of the symbol. They serve to indicate clearly, e.g., in the case of “Vv” here, 
what is part of the symbol and what is not (the following period is not). 

“arity” is derived from “ary” of “unary”, “binary”, etc. It denotes the number of arguments 
needed by a symbol according to the dictates of correct syntax. Function and predicate symbols 
need arguments. 


a 


wm 


© 


10 


I. A Bit of Logic: A User’s Toolbox 


exponentiation operation (function) a’. The first three are function symbols of 

arities 1, 2, and 2 respectively. 0 is a constant symbol, < is a predicate of arity 2, 

and whatever symbol we might introduce to denote a? would have arity 2. 
The following list gives the general picture. 


Nonlogical Symbols. 


NLS.1. 


NLS.2. 


NLS.3. 


A (possibly empty) set of symbols for constants. We normally use 
the metasymbols' a, b, c, d, e, with or without primes or subscripts, to 
stand for constants unless we have in mind some alternative “standard” 
formal notation in specific theories (e.g., J, 0, @). 

A (possibly empty) set of symbols for predicate symbols or relation 
symbols for each possible arity n > 0. We normally use P, Q, R, 
generically, with or without primes or subscripts, to stand for predicate 
symbols. Note that = is in the logical camp. Also note that theory- 
specific formal symbols are possible for predicates, e.g., <, €, U. 
Finally, a (possibly empty) set of symbols for functions for each possi- 
ble arity n > 0. We normally use f, g, h, generically, with or without 
primes or subscripts, to stand for function symbols. Note that theory- 
specific formal symbols are possible for functions, e.g., +, x. 


1.1.3 Remark. (1) We have the option of assuming that each of the logical 
symbols that we named in LS.1-LS.5 have no further structure and that the 
symbols are, ontologically, identical to their names, that is, they are just these 
exact signs drawn on paper (or on any equivalent display medium). 

In this case, changing the symbols, say, — and 4 to ~ and E respectively 
results in a “different” logic, but one that is, trivially, isomorphic to the one we 
are describing: Anything that we may do in, or say about, one logic trivially 
translates to an equivalent activity in, or utterance about, the other as long as 
we systematically carry out the translations of all occurrences of — and 4 to ~ 
and E respectively (or vice versa). 

An alternative point of view is that the symbol names are not the same as 
(identical with) the symbols they are naming. Thus, for example, “—” names 
the connective we pronounce not, by we do not know (or care) exactly what 
the nature of this connective is (we only care about how it behaves). Thus, the 
name “—” becomes just a typographical expedient and may be replaced by other 
names that name the same object, not. 

This point of view gives one flexibility in, for example, deciding how the 
variable symbols are “implemented”. It often is convenient to suppose that the 


¥ Metasymbols are informal (i.e., outside the formal language) symbols that we use within “real” 
mathematics — the metatheory — in order to describe, as we are doing here, the formal language. 
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entire sequence of variable symbols was built from just two symbols, say, “v” 
and ‘|”.i One way to do this is by saying that v; is a name for the symbol 
sequence 


vi...[v. 
7 


Regardless of option, v; and v; will name distinct objects if i A j. 

This is not the case for the metavariables (abbreviated informal names) 
x,y, Z,u,v,w. Unless we say explicitly otherwise, x and y may name the 
same formal variable, say, v131. 

We will mostly abuse language and deliberately confuse names with the 
symbols they name. For example, we will say “let v1997 be an object variable...” 
rather than “let vj997 mame an object variable ...”, thus appearing to favour 
option one. 

(2) Any two symbols included in the alphabet are distinct. Moreover, if any 
of them are built from simpler sub-symbols — e.g., vo, vj, v2,... might really 
name the strings vv, v|v, v||v,...—then none of them is a substring (or subex- 
pression) of any other.! 

(3) A formal language, just like a natural language (such as English or Greek), 
is alive and evolving. The particular type of evolution we have in mind is the 
one effected by formal definitions. Such definitions continually add nonlogical 
symbols to the language.’ 

Thus, when we say that, e.g., “e and U are the only nonlogical symbols of 
set theory”, we are telling a small white lie. More accurately, we ought to have 
said that “e and U are the only ‘primitive’ (or primeval) nonlogical symbols of 
set theory”, for we will add loads of other symbols such as U, w, @, C, and C. 

This evolution affects the (formal) language of any theory, not just that of 
set theory. 


— 


We intend these two symbols to be identical to their names. No philosophical or other purpose 
will be served by allowing more indirection here (such as “uv names u, which actually names w, 
which actually is ...”’). 

What we have stated under (2) are requirements, not metatheorems. That is, they are nothing of 


a 


the sort that we can prove about our formal language within everyday mathematics. 


wo 


This phenomenon will be visited upon in some detail in what follows. By the way, any additions 
are made to the nonlogical side of the alphabet, since all the logical symbols have been given, 
once and for all. 
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© Wait a minute! If formal set theory is to serve as the foundation of all mathe- 
matics, and if the present chapter is to assist towards that purpose, then how is 
it that we are already employing natural numbers like 12000000560000009 as 
subscripts in the names of object variables? How is it permissible to already talk 
about “‘sets of symbols” when we are about to found a theory of sets formally? 
Surely we do not have’ any of these items yet, do we? 

This protestation is offered partly in jest. We have already said that we work 
within real mathematics as we build the “replicas” or “simulators” of logic 
and set theory. Say we are Platonists. Then the entire body of mathematics — 
including infinite sets, in particular the set of natural numbers N — is available 
to us as we are building whatever we are building. 

We can thus describe how we assemble the simulator and its various parts 
using our knowledge of real mathematics, the language of real mathematics, 
and all “building blocks” available to us, including sets, infinite or otherwise, 
and natural numbers. This mathematics “exists” whether or not anyone ever 
builds a formal simulator for naive set theory, or logic for that matter. Thus any 
apparent circularity disappears. 

Now if we are not Platonists, then our mathematical “reality” is more re- 
stricted, but, nevertheless, building a simulator or not in this reality does not 
affect the existence of the reality. We will, however, this time, revise our tools. 
For example, if we prefer to think that individual natural numbers exist (up 
to any size), but not so their collection N, then it is still possible to build our 
formal languages (in particular, as many object variables as we want) — pretty 
much as already described — in this restricted metatheory. We may have to 
be careful not to say that we have a unending sequence of such variables, as 
this would presume the existence of infinite sets in the metatheory.* We can 
say instead that a variable is any object of the form v; where i is a (meaning- 
less) word of (meaningless) symbols, the latter chosen out of the set or list 
“0, 1,2, 3,4,5, 6, 7, 8,9”. 

Clearly the above approach works even within a metatheory that has failed 
to acknowledge the existence of any natural numbers.’ Oe 


In this volume we will take the normal user-friendly position that is habi- 
tual nowadays, namely, that our metatheory is the Platonist’s (infinitary) 


mathematics. © 


+ “Do not have” in the sense of having not formally defined — or proved to exist — or both. 

= A finitist would have none of it, although a post-Brouwer intuitionist would be content that such 
a sequence is finitely describable. 

8 Hilbert, in his finitistic metatheory, built whatever natural numbers he needed by repeating the 
stroke symbol “|”. 
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1.1.4 Definition (Terminology about Strings). A symbol sequence, or expres- 
sion (or string), that is formed by using symbols exclusively out of a given set! 
M is called a string over the set, or alphabet, M. 

If A and B denote strings (say, over M), then the symbol A « B, or more 
simply AB, denotes the symbol sequence obtained by listing first the symbols 
of A in the given left to right sequence, immediately followed by the symbols of 
B in the given left to right sequence. We say that A B is (more properly, denotes 
or names) the concatenation of the strings A and B in that order. 

We denote the fact that the strings (named) C and D are identical sequences 
(but we just say that they are equal) by writing C = D. The symbol ¥ denotes 
the negation of the string equality symbol =. Thus, if # and ? are (we do mean 
“are”) symbols from an alphabet, then #?? = #?? but #? 4 #??. We can also 
employ = in contexts such as “let A = ##?”, where we give the name A to the 
string ##?.+ 


In this book the symbol = will be used exclusively in the metatheory as equality 
of strings over some set M. © 


The symbol A normally denotes the empty string, and we postulate for it the 
following behaviour: 


A=AA=)A for all strings A. 


We say that A occurs in B, or is a substring of B, iff® there are strings C and 
D such that B = CAD. For example, “(’ occurs four times in the (explicit) 
string “A(QV)((’, at positions 2, 3, 7, 8. Each time this happens we have an 
occurrence of “(? in “A(OV)((’. 


If C = 1, we say that A is a prefix of B. If moreover D ¥ 1, then we say 
that A is a proper prefix of B. 


1.1.5 Definition (Terms). The set of terms, Term, is the smallest set of strings 
over the alphabet 7” with the following two properties: 


(1) Any of the items in LS.1 or NLS.1 (x, y, z, a, b,c, etc.) are included. 


1 A set that supplies symbols to be used in building strings is not special. It is just a set. However, 
it often has a special name: “alphabet”. 

= Punctuation, such as “.”, is not part of the string. One often avoids such footnotes by quoting 
strings that are explicitly written as symbol sequences. For example, if A stands for the string 
#, one writes A = “#”. Note that we must not write “A”, unless we mean a string whose only 
symbol is A. 

8 Tf and only if. 
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(2) If f is a function’ of arity n and ft, h,..., tf, are included, then so is the 


” 


string “ftita...t)”. 


The symbols ¢, s, and u, with or without subscripts or primes, will denote 
arbitrary terms. As they are used to describe the syntax of terms, we often call 
such symbols syntactic variables — which is synonymous with metavariables. 


1.1.6 Remark. (1) We often abuse notation and write f(t,...,¢,) instead of 
fti..-th- 

(2) Definition I.1.5 is an inductive definition.: It defines a more or less com- 
plicated term by assuming that we already know what simpler terms look like. 
This is a standard technique employed in real mathematics (within which we are 
defining the formal language). We will have the opportunity to say more about 
such inductive definitions — and their appropriateness — in a @@ comment 
later on. 


(3) We relate this particular manner of defining terms to our working def- 
inition of a theory (given on p. 7 immediately before Remark I.1.1 in terms 
of “rules” of formation). Item (2) in I.1.5 essentially says that we build new 
terms (from old ones) by applying the following general rule: Pick an arbitrary 
function symbol, say f. This has a specific formation rule associated with it. 
Namely, “for the appropriate number, n, of an already existing ordered list of 
terms, f1,..., ¢,, build the new term consisting of f, immediately followed by 
the ordered list of the given terms”. 

For example, suppose we are working in the language of number theory. 
There is a function symbol + available there. The rule associated with + builds 
the new term +tfs for any prior obtained terms ¢ and s. Thus, +v,v,3 and 
+0421 + v1 13 are well-formed terms. We normally write terms of number the- 
ory in infix notation,’ ie., t+s, vj) +v13 and vj2; +(v; +v13) (note the intrusion of 
brackets, to indicate sequencing in the application of +). 

A by-product of what we have just described is that the arity of a function 
symbol f is whatever number of terms the associated rule will require as 
input. 


¥ We will omit from now on the qualification “symbol” from terminology such as “function sym- 
bol”, “constant symbol”, “predicate symbol”. 

* Some mathematicians are adamant that we call this a recursive definition and reserve the term 
“induction” for “induction proofs”. This is seen to be unwarranted hairsplitting if we consider 
that Bourbaki (1966b) calls induction proofs ““démonstrations par récurrence”’. We will be less 
dogmatic: Either name is all right. 

8 Function symbol placed between the arguments. 
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(4) A crucial word used in I.1.5 (which recurs in all inductive definitions) is 
“smallest”. It means “least inclusive” (set). For example, we may easily think of 
a set of strings that satisfies both conditions of the above definition, but which 
is not “smallest” by virtue of having additional elements, such as the string 


“cc cz) 


7 


Pause. Why is “——(’ not in the smallest set as defined above, and therefore 
not a term? 


The reader may wish to ponder further on the import of the qualification 
“smallest” by considering the familiar (similar) example of N. The principle of 
induction in N ensures that this set is the smallest with the properties 


(i) 0 is included, and 
(ii) if n is included, then so isn + 1. 


By contrast, all of Z (set of integers), Q (set of rational numbers), and R (set 
of real numbers) satisfy (i) and (ii), but they are clearly not the “smallest” such. 


© 


1.1.7 Definition (Atomic Formulas). The set of atomic formulas, Af, contains 
precisely: 


(1) The strings t = s for every possible choice of terms f, s. 
(2) The strings Pt,tz ... t, for every possible choices of n-ary predicates P (for 
all choices of n > 0) and all possible choices of terms 11, fo,..., tn. 


@ we often abuse notation and write P(t),...,t,) instead of Pty... ty. © 


1.1.8 Definition (Well-Formed Formulas). The set of well-formed formulas, 
Wff, is the smallest set of strings or expressions over the alphabet 7” with the 
following properties: 


(a) All the members of Af are included. 

(b) If. 4 and.# denote strings (over 7’) that are included, then (14 V .#) and 
(=. 4) are also included. 

(c) If. ist a string that is included and x is any object variable (which may or 
may not occur (as a substring) in the string _4), then the string ((Ax).4) 
is also included. We say that . 4 is the scope of (Ax). 


¥ Denotes. 
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1.1.9 Remark. 

(1) The above is yet another inductive definition. Its statement (in the met- 
alanguage) is facilitated by the use of syntactic, or meta-, variables —.4 and 
.£ —used as names for arbitrary (indeterminate) formulas. We first encountered 
the use of syntactic variables in Definition I.1.5. 

In general, we will let calligraphic capital letters .4,.7, 7, Y, @,.F, F 
(with or without primes or subscripts) be syntactic variables (i.e., metalinguistic 
names) denoting well-formed formulas, or just formulas, as we often say. The 
definition of Wff given above is standard. In particular, it permits well-formed 
formulas such as ((4x)((Ax)x = 0)) in the interest of making the formation 
rules context-free.' 

(2) The rules of syntax just given do not allow us to write things such as 4 f 
or 3P where f and P are function and predicate symbols respectively. That 
quantification is deliberately restricted to act solely on object variables makes 
the language first order. 

(3) We have already indicated in Remark I.1.6 where the arities of function 
and predicate symbols come from (Definitions I.1.5 and I.1.7 referred to them). 
These are numbers that are implicit (“hardwired’’) within the formation rules 
for terms and atomic formulas. Each function, and each predicate symbol —e.g., 
+, x, €, < —has its own unique associated formation rule. This rule “knows” 
how many terms are needed (on the “input side’’) in order to form a term or 
atomic formula. 

There is an alternative way of making arities of symbols known (in the 
metatheory): Rather than embedding arities in the formation rules, we can hide 
them inside the ontology of the symbols, not making them explicit in the name. 
For example, a new symbol, say *, can be used to record arity. That is, we can 
think of a predicate (or function) symbol as consisting of two parts: an arity 
part and an “all the rest” part, the latter needed to render the symbol unique. 
For example, € may be actually the short name for the symbol “e**”, where 
this latter name is identical to the symbol it denotes, or “what you see is what 
you get” — see Remark I.1.3(1) and (2), p. 10. The presence of the two asterisks 
declares the arity. Some people say this differently: They make available to the 
metatheory a “function”, ar, from “the set of all predicate and function symbols” 
(of a given language) to the natural numbers, so that for any function symbol f 
or predicate symbol P, ar(f) and ar(P) yield the arity of f or P respectively. 


1 In some presentations, formation rule I.1.8(c) is context-sensitive: It requires that x be not already 
quantified in . 4. 

= In mathematics we understand a function as a set of input-output pairs. One can “glue” the two 
parts of such pairs together, as in “e**” — where “‘e” is the input part and “x” is the output part, 
the latter denoting “2” — etc. Thus, the two approaches are equivalent. 
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(4) As a consequence of the remarks in (3) the theory can go about its job of 
generating, say, terms using the formation rules, at the same time being unable 
to see or discuss these arities, since these are hidden inside the rules (or inside 
the function or predicate names in the alternative approach). So it is not in the 
theory’s “competence” to say, e.g., “hmm, this function has arity 10011132”. 
Indeed, a theory cannot even say “hmm, so this is a function (or a term, or a 
wif)”. A theory just generates strings. It does not test them for membership in 
syntactic categories, such as variable, function, term, or wff. A human user of 
the theory, on the other hand, can, of course, make such observations. Indeed, 
in theories such as set theory and arithmetic, the human user can even write a 
computer program that correctly makes such observations. But both of these 
agents, human and computer, act in the metatheory. 

(5) Abbreviations: 


Abr1. The string ((Vx).Z) abbreviates the string (~((Ax)(-.4))). Thus, for 
any explicitly written formula. 4, the former notation is informal (meta- 
mathematical), while the latter is formal (within the formal language). In 
particular, V is a metalinguistic symbol. “Vx” is the universal quantifier. 
_@ is its scope. The symbol V is pronounced for all. 


We also introduce — in the metalanguage — a number of additional Boolean 
connectives in order to abbreviate certain strings: 


Abr2. Conjunction, A. (4 A.) stands for (=~((-_4) Vv (=.#))). The symbol 
A is pronounced and. 

Abr3. Classical or material implication, >. (4 — .#) stands for ((-=.4) V 
.B). (4 — .£)is pronounced if.Z, then .%. 

Abr4. Equivalence, <. (4 < .#) stands for (4 > .2)ACB > .4)). 

Abr5. To minimize the use of brackets in the metanotation, we adopt standard 
priorities of connectives, that is, V, 4, and — have the highest; then 
we have (in decreasing order of priority) A, V, >, <>; and we agree 
not to use outermost brackets. All associativities are right — that is, 
if we write.4 > .@ — @, then this is a (sloppy) counterpart for 
(4 >(8 > @)). 


(6) The language just defined, L, is one-sorted, that is, it has a single sort or 
type of object variable. Is this not inconvenient? After all, our set theory will 
have both atoms and sets. In other theories, e.g., geometry, one has points, lines, 
and planes. One would have hoped to have different types of variables, one for 
each. 

Actually, to do this would amount to a totally unnecessary complication of 
syntax. We can (and will) get away with just one sort of object variable. For 
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example, in set theory we will also introduce a 1-aryt predicate, U, whose job is 
to test an object for “sethood’” (vs. atom status). Similar remedies are available 
to other theories. For example, geometry will manage with one sort of variable, 
and unary predicates “Point”, “Line’’, and “Plane”. 

Apropos language, some authors emphasize the importance of the nonlogical 
symbols, taking at the same time the formation rules for granted; thus they say 
that we have a language, say, “L = {€, U}’ rather than “L = (7, Term, Wff ) 
where 7 has € and U as its only nonlogical symbols”. That is, they use 
“language” for the nonlogical part of the alphabet. 


© This comment requires some familiarity with elementary concepts — such 
as BNF notation for grammar specification — encountered in a course on 
formal languages and automata, or, alternatively in language manuals for Algol- 
like programming languages (such as Algol itself, Pascal, etc.); hence the @ @ 
sign. 


We have said above “This rule ‘knows’ how many terms are needed (on 
the ‘input side’) in order to form a term or atomic formula.” We often like to 
personify rules, theories, and the like, to make the exposition more relaxed. 
This runs the danger of being misunderstood on occasion. Here is how a rule 
“knows”. 

Syntactic definitions in the part of theoretical computer science known as 
formal language theory are given by a neat notation called BNF: To fix ideas, 
let us say that we are describing the terms of a specific first order language that 
contains just one constant symbol, “c’”’, and just two function symbols, “f” and 
“g”, where we intend the former to be ternary (arity 3) and the latter of arity 5. 
Moreover, assume that the variables vo, v;,... are short names for vv, v|v,... 
respectively. 

Then, using the syntactic names (term), (var), (strokes) to stand for any 
term, any variable, any string of strokes, we can recursively define these syn- 
tactic categories as follows, where we read “—>” as “is defined as” (the right 


hand side), and the big stroke, “|” — pronounced “or” — gives alternatives in the 


+ More usually called unary. 

People writing about, or teaching, set theory have made this word up. Of course, one means by 
it the property of being a set. 

8 Backus-Naur form. Rules (1)-(3) are in BNF. In particular the alternative symbol “|” is part of 
BNF notation, and so is the (...) notation for the names of syntactic categories. The “—” has 
many typographical variants, including “::=”. 


© 
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definition (of the left hand side): 


(1) (strokes) > A| (strokes)| 
(2) (var) — v(strokes)v 


(3) (term) — c|(var) 


f (term) (term) (term) |g(term) (term) (term) (term) (term) 


For example, rule (1) says that a string of strokes is (defined as) either the empty 
string A, or a string of strokes followed by a single stroke. 

Rule (3) shows clearly how the “knowledge” of the arities of f and g is 
“hardwired” within the rule. For example, the third alternative of that rule says 
that a term is a string composed of the symbol “f” followed immediately by 
three strings, each of which is a term. 


A variable that is quantified is bound in the scope of the quantifier. Non- 
quantified variables are free. We also give below, by induction on formulas, 
precise (metamathematical) definitions of “free” and “bound”. 


1.1.10 Definition (Free and Bound Variables). An object variable x occurs 
free in a term ¢ or atomic formula .4 iff it occurs in t or .4 as a substring 
(see I.1.4). 


x occurs free in (—.4) iff it occurs free in. 4. 

x occurs free in (4 V .#) iff it occurs free in at least one of .4 and.#. 

x occurs free in ((Ay).4) iff x occurs free in. 4 and y is not the same variable 
as x.i 

The y in ((Ay).4) is, of course, not free — even if it might be so in.4 — as 
we have just concluded in this inductive definition. We say that it is bound in 
((Ay).4). Trivially, terms and atomic formulas have no bound variables. 


1.1.11 Remark. (1) Of course, Definition I.1.10 takes care of the defined con- 
nectives as well, via the obvious translation procedure. 


(2) Notation. If .4 is a formula, then we often write .4[y,,..., yg] to 
indicate interest in the variables y,,..., yg, which may or may not be free in 


¥ Recall that x and y are abbreviations of names such as v120009g and v11009 (which name distinct 
variables). However, it could be that both x and y name v10;. Therefore it is not redundant to say 
“and y is not the same variable as x”. By the way, x # y says the same thing, by I.1.4. 
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_%@. There may be other free variables in. 4 that we may have chosen not to in- 


clude in the list. On the other hand, if we use round brackets, asin.4(y1,..-, Yx)s 
then we are implicitly asserting that y),..., yg is the complete list of free 


variables that occur in .4. 


I.1.12 Definition. A term or formula is closed iff no free variables occur in it. 
A closed formula is called a sentence. 

A formula is open iff it contains no quantifiers (thus, an open formula may 
also be closed). 


1.2. A Digression into the Metatheory: Informal 
Induction and Recursion 


We have already seen a number of inductive or recursive definitions in Sec- 
tion I.1. The reader, most probably, has already seen or used such definitions 
elsewhere. 

We will organize the common important features of inductive definitions in 
this section for easy reference. We will revisit these issues, within the framework 
of formal set theory, in due course, but right now we need to ensure that our grasp 
of these notions and techniques, at the metamathematical level, is sufficient for 
our needs. 

One builds a set S by recursion, or inductively (or by induction), out of two 
ingredients: a set of initial objects, .7, and a set of rules or operations, .#. A 
member of .# — a rule — is a (possibly infinite) table, or relation, like 


y1 Yn z 
ay an an+1 
by by b+ 
If the above rule (table) is called Q, then we use the notations QO(d,..., dn, 
An41) andi (ay,...,4n,4n41) € Q interchangeably to indicate that the ordered 
sequence or “row” a,..., Gn, An41 iS present in the table. We say “Q(aj,..., 
An, 4n41) holds” or “Q(a1,..., Gn, 4n+1) is true”, but we often also say that 
“Q applied to a),..., a, yields a,+1”, or that “a,41 is a result or output of 
Q, when the latter receives input aj,...,d,”’. We often abbreviate such inputs 


+ “x © A” means that “x is a member of — or is in — A” in the informal set-theoretic sense. 
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using vector notation, namely, a, (or just a, if n is understood). Thus, we often 


write Q(Gn41) for O(a1,..., ns An+1)- 
A tule Q that has n + | columns is called (n + 1)-ary. 


1.2.1 Definition. We say “a set T is closed under an (n + 1)-ary rule Q” to 
mean that whenever c),...,C, are all in 7, then d € T for all d satisfying 
QO(c1,...,Cn, a). 


With these preliminary understandings out of the way, we now state 


1.2.2 Definition. S is defined by recursion, or by induction, from initial objects 
7 and set of rules .7%, provided it is the smallest (least inclusive) set with the 
properties 


qq) 7CS,i 
(2) S is closed under every Q in .%. In this case we say that S is .#-closed. 


We write S = CI(7,.%), and say that “S is the closure of 7 under #7’. 
We have at once: 
1.2.3 Metatheorem (Induction on S). If S = Cl(7,.#%) and if some set T 


Satisfies 


(1) FCT, and 
(2) T is closed under every Q in .%, 


thenS CT. 
Pause. Why is the above a metatheorem? 


The above principle of induction on S is often rephrased as follows: To prove 
that a “property” P(x) holds for all members of C17, .#), just prove that 


(a) Every member of .7 has the property, and 

(b) The property propagates with every rule in.%, i.e., if P(c;) holds (is true) 
fori = 1,...,n, and if Q(c,,...,¢,,d) holds, then d too has property 
P(x) — that is, P(d) holds. 


+ From our knowledge of elementary informal set theory, we recall that A C B means that every 
member of A is also a member of B. 
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Of course, this rephrased principle is valid, for if we let T be the set of all 
objects that have property P(x)—for which set one employs the well-established 
symbol {x : P(x)}—then this T satisfies (1) and (2) of the metatheorem.‘ © 


1.2.4 Definition (Derivations and Parses). A (.7,.#)-derivation, or simply 
derivation — if .7 and .# are understood — is a finite sequence of objects 
d,,...,d, (n > 1) such that each d; is 


(1) A member of .7, or? 
(2) For some (r + 1)-ary O € .#, O(dj,,...,dj,,d;) holds, and j; < i for 
| eee ae 


We say that d; is derivable within i steps. 
A derivation of an object A is also called a parse of a. 


@ Tally, if d,,..., dy is a derivation, then so is dj,...,d forany 1 < m <n. 


If d is derivable within n steps, it is also derivable in k steps or less for all 
k > n, since we can lengthen a derivation arbitrarily by adding .7-elements to it. © 


1.2.5 Remark. The following metatheorem shows that there is a way to “con- 
struct” CI(7, .#) iteratively, i.e., one element at a time by repeated application 
of the rules. 

This result shows definitively that our inductive definitions of terms (1.1.5) 
and well-formed formulas (1.1.8) fully conform with our working definition of 
theory, as an alphabet and a set of rules that are used to build formulas and 
theorems (p. 7). 


1.2.6 Metatheorem. 


Cl(.7, . 4%) = {x : x is (Y,.%)-derivable within some number of steps, n} 


Proof. For notational convenience let us write 
T = {x : x is (Y,.#%)-derivable within some number of steps, }. 


As we know from elementary naive set theory, we need to show here both 
Cl(7, #) € T and Cl(7, .#) D T to settle the claim. 

C: We do induction on Cl(7, .#) (using 1.2.3). Now .7 C T, since every 
member of .Y is derivable in n = 1 step (why?). 


¥ We are sailing too close to the wind here. It turns out that not all properties P(x) lead to sets 
{x : P(x)}. Our explanation was naive. However, formal set theory, which is meant to save us 
from our naiveté, upholds the principle (a)—(b) using just a slightly more complicated explanation. 
The reader can see this explanation in Chapter VII. 

This “or” is inclusive: (1), or (2), or both. 
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Also, T is closed under every Q in .%. Indeed, let such an (r + 1)-ary Q be 
chosen. Let 


Q(a1,..., a,b) (i) 
and {a,,...,a,} © T. Thus, each a; has a (.Y, .#)-derivation. Concatenate all 
these derivations: 

oy Aly. +, 2,-2.2 5-2+, 4 


The above is a derivation (why?). But then, so is 
weg Agency Aye. yee, yd 


by (i). Thus, b € T. 

D: We argue this — that is, if d € T, thend € Cl(7,.#) — by induction on 
the number of steps, 1, in which d is derivable. 

For n = 1 we have d € .Y and we are done, since .7 € Cl(7, .#). 

Let us make the induction hypothesis (I.H.) that for derivations of <n steps 
the claim is true. 

Let then d be derivable within n + 1 steps. Thus, there is a derivation 
,...,4n,d. 

Now, if d € .7, we are done as above (is this a “real case”’?). 

If on the other hand Q(aj,,...,a;,d), then, for i=1,...,7r, we have 
aj, € Cl(7, .#) by the I.H.; hence de Cl(.7,.#), since the closure is closed 
under all Q € .%. 


1.2.7 Example. One can see now that N = Cl(7,.#), where 7 = {0} and .# 
contains just the relation y = x + | (input x, output y). Similarly, Z, the set 
of all integers, is Cl(.7,.#%), where .7 = {0} and .% contains just the relations 
y=x+land y =x — | (input x, output y). 

For the latter, the inclusion Cl(7, .#) C Z is trivial (by 1.2.3). For > we eas- 
ily see that any n € Z has a (.7, .#)-derivation (and then we are done by I.2.6). 
For example, ifn > 0, then 0, 1, 2,...,7 is a derivation, while ifn < 0, then 
0, —1, —2,...,n is one. Ifn = 0, then the one-term sequence 0 is a derivation. 

Another interesting closure is obtained by .7 = {3} and the two relations 
z=x+yandz=x-— y. This is the set {3k : k € Z} (see Exercise I.1). 


Pause. So, taking the first sentence of I.2.7 one step further, we note that we 
have just proved the induction principle for N, for that is exactly what the 
“equation” N = CI(7, .#) says (by 1.2.3). Do you agree? 


There is another way to view the iterative construction of Cl(.7,.#): The 
set is constructed in stages. Below we are using some more notation borrowed 
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from informal set theory. For any sets A and B we write A U B to indicate the 
set union which consists of all the members found in A or B or in both. More 
generally, if we have a lot of sets, Xo, X1, X2,..., that is, one X; for every 
integer i > 0 — which we denote by the compact notation (X;);+9 — then we 
may wish to form a set that includes all the objects found as members all over 
the X;, that is (using inclusive, or “logical”, “or’s below), form 


{x:x € Xjorx € X,or...} 
or, more elegantly and precisely, 
{x : for somei > 0, x € X;} 


The latter is called the union of the sequence (X;);+o9 and is often denoted by 
U X; or U X; 
i=0 i20 
Correspondingly, we write 
U X; or U X; 
i<n i<n 


if we only want to take a finite union, also indicated clumsily as Xj U---U Xp. 


1.2.8 Definition (Stages). In connection with Cl(7, .#) we define the sequence 
of sets (X;);+9 by induction on n, as follows: 


Xo= ZF 


Xn = (U x) U f : for some QO € .% and some a, in U X;, O(Gn, | 


i<n i<n 


That is, to form X,,41 we append to L), <n Xi all the outputs of all the relations 
in .# acting on all possible inputs, the latter taken from (J Pepe. Ce 
We say that X; is built at stage i, from initial objects .7 and rule set .#. 


In words, at stage 0 we are given the initial objects (Xo = .7). At stage 1 we 
apply all possible relations to all possible objects that we have so far — they 
form the set Xo — and build the first stage set, X,, by appending the outputs to 
what we have so far. At stage 2 we apply all possible relations to all possible 
objects that we have so far — they form the set Xp U X, — and build the second 
stage set, X2, by appending the outputs to what we have so far. And so on. 
When we work in the metatheory, we take for granted that we can have 
simple inductive definitions on natural numbers. The reader is familiar with 
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several such definitions, e.g., 


a =1 (for a £ 0 throughout) 


q't} =a-q" 


We will (meta)prove a general theorem on the feasibility of recursive definitions 
later on (1.2.13). 


The following theorem connects stages and closures. 


1.2.9 Metatheorem. With the X; as in 1.2.8, 


C7, ZA = U x 


i>0 


Proof. ©: We do induction on Cl(.7, .#). For the basis, .7 = Xo C Uso Xj. 


We show that );.. X; is.#-closed. Let Q € .# and O(a, b) hold, for some 
a, in Uso Xj. Thus, by definition of union, there are integers ji, jo,..-, jn 
such that a; € Kat = leesghilik= max ji, 2625 jal, then a, isin Ly XH 
hence b € Xx41 © Ujs9 Xi. 


i<k 


D: It suffices to prove that X,, C Cl(.7, .#), a fact we can prove by induction 
on n. For n = 0 it holds by 1.2.2. As an LH. we assume the claim for alln < k. 

The case fork +1: X41 is the union of two sets. One is );-, X;. This is 
a subset of Cl(.7, .#) by the IH. The other is 


i<k 


i<k 


f : for some QO € .# and some a in U X;, O(a, o| 


This too is a subset of CI(7, .#), by the preceding observation and the fact that 
Cl(Z, #) is #-closed. 


oN; An inductively defined set can be built by stages. 


1.2.10 Definition (Immediate Predecessors; Ambiguity). If d € Cl(.7,.%) 
and for some Q and qj,..., a, it is the case that Q(a,,...,a,,d), then the 
a,..., a, are immediate Q-predecessors of d, or just immediate predecessors 
if Q is understood; for short, 1.p. 

A pair (7,.#%) is called ambiguous if some d € Cl(.7,.#) satisfies any 
(or all) of the following conditions: 


(i) It has two (or more) distinct sets of immediate P-predecessors for some 
rule P. 
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(ii) It has both immediate P-predecessors and immediate Q-predecessors, for 


P#Q. 


(iii) It is a member of .7, yet it has immediate predecessors. 


If (7, .#) is not ambiguous, then it is unambiguous. 


1.2.11 Example. The pair ({00, 0}, {Q}), where Q(x, y, z) holds iff z = xy 
(where “xy” denotes the concatenation of the strings x and y, in that order), is 
ambiguous. For example, 0000 has the two immediate predecessor sets {00, 00} 
and {0, 000}. Moreover, while 00 is an initial object, it does have immediate 
predecessors — namely, the set {0, 0} (or, what amounts to the same thing, {0}). 


1.2.12 Example. The pair (7, .#) where .7 = {3} and.% consists of z= x+y 
and z = x — y is ambiguous. Even 3 has (infinitely many) distinct sets of ip. 
(e.g., any {a, b} such that a + b = 3, ora — b = 3). 

The pairs that effect the definition of Term (1.1.5) and Wff (1.1.8) are un- 
ambiguous (see Exercises I.2 and I.3). © 


1.2.13 Metatheorem (Definition by Recursion). Let (.7,.%) be unambiguous, 
and C\(.7, .#%) C A, where A is some set. Let also Y be a set, and‘h: 7 > Y 
and gg, for each Q € &, be given functions. For any (r + 1)-ary Q, an input for 
the function gg is a sequence (a, b,,..., b,), where a isin A and the b,..., b; 
are allin Y. All the gg yield outputs in Y. 

Under these assumptions, there is a unique function f : Cl(7,.%) > Y 


such that 
y =h(x) andx € 7 
; or, for some QO €.%, 
-_ 1 
y= f@) uf y = go(%,01,...,0,) and Q(a,...,a,, x) holds, () 


where 0; = f(a;) fori =1,...,r 


oe The reader may wish to skip the proof on first reading. 


Proof. Existence part. For each (r + 1)-ary Q € .%, define O by? 
O((a1,01),.--, (ay, Or), (b, 8o(b, O1,-- +, 0r))) iff Q(a1,-.-,4r,b) (2) 
¥ The notation f : A > B is common in informal (and formal) mathematics. It denotes a function 


f that receives “inputs” from the set A and yields “outputs” in the set B. 
 Forarelation Q, writing just “Q(a),..., a,, b)” is equivalent to writing “Q(a),..., a,, b) holds”. 
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For any aj,...,a,,b, the above definition of O is effected for all possible 
choices of 01,..., 0, such that gg(b, 01,..., 0,) is defined. 


Collect now all the O to form a set of rules 2. 

Let also 7 = {(x, h(x)) :x € 7}. 

We will verify that the set F = CUT, Z) isa 2-ary relation that for every 
input yields at most one output, and therefore is a function. For such a relation 
it is customary to write, letting the context fend off the obvious ambiguity in 
the use of the letter F’, 


y= F(x) iff F(x, y) (*) 


We will further verify that replacing f in (1) above by F results in a valid 
equivalence (the “iff” holds). That is, F' satisfies (1). 


(a) We establish that F is a relation composed of pairs (x, y) (x is input, y is 
output) where x € CI(7,.#%) and y € Y. This follows easily by induction 
on F (1.2.3), since .7 C F, and the property (of containing such pairs) 
propagates with each a) (recall that the gg yield outputs in Y). 

(b) We next show “if (x, y) € F and (x, z) € F, then y = z” — that is, F is 
single-valued, or well defined, in short, it is a function. We again employ 
induction on F’, thinking of the quoted statement as a “property” of the 
pair (x, y): Suppose that (x, y) € F and let also (x,z) € F. By 1.2.6, 
(x,z) € For O(la1, 01),..-, (dps Or), (x, Z)), where O(ay,...,a,, x) and 
Z = go(x,01,...,0,), for some (r + 1)-ary O and (a, 01),..., (Gr, Or) 
in F. The right hand side of the italicized “or” cannot hold for an un- 
ambiguous (7, .#), since x cannot have i.p. Thus (x, z) € 7, hence y= 
h(x) = z. To prove that the property propagates with each O, let 


~ 


QO((a1, 01), --- 5 (dr, Or), (x, Y)) 
but also 
P((B1, 04), +++ 5 (bis 01). (5 2)) 
where O(qa),...,4;,X), P(bi,..., bj, x), and 
¥ = 2 9%).01;:..)0)) and. 2 = ep(X,0]655.-50)) (3) 


Since (.7,.%) is unambiguous, Q = P (hence also O = P), r = 1, and 
a; = b;, fori = 1,...,r. By the LH., 0; = o;, fori = 1,...,7r; hence 
y =z by (3). 


© 
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(c) Finally, we show that F satisfies (1). We do induction on C7, R ), to 
prove <: If x € Zand y = h(x), then F(x, y) (i.e., y = F(x) in the 
alternative notation («)), since F C F. Let next y = go(x,01,..., 0;) 
and Q(a),...,a,,x), where also F(qa;,0;), fori = 1,...,r. By (2), 
O((a, O1),+++5 (dr; Or), (x, Q(X, O1,-+- o,))); thus — F being closed un- 
der all the rules in R — F(x, gg(b, 01,...,0,)) holds, in short, F(x, y) or 
y = F(x). For >, now we assume that F(x, y) holds and we want to infer 
the right hand side (of iff) in (1). We employ Metatheorem I.2.6. 

Case I. Let (x, y) be F-derivable! inn = 1 step. Then (x, y) € 7. Thus 
y=h(x). 

Case 2. Suppose next that (x, y) is F-derivable within n + 1 steps, namely, 
we have a derivation 


(X1, V1), (X25 Ya)s-++5 (Xn Yn)s (X,Y) (4) 
where O(a, 01),--+, (Ay, Or), (x, y)) and Q(a),...,a,, x) (see (2)), and 
each of (a), 01),..., (a;, Oy) appears in the above derivation, to the left of 


(x, y). This entails (by (2)) that y = gg(x, 01,...,0,). Since the (a;, 0;) 
appear in (4), F(a;,0;) holds fori = 1,...,7r. Thus, (x, y) satisfies the 
right hand side of iffin (1), once more. 


Uniqueness part. Let the function K also satisfy (1). We show, by induction 
on Cl(7, .#), that 


forallx € C7, Z)andallyeY, y=F(x)iffy=K(x) (5) 


—: Letx €.%,and y = F(x). By lack of ambiguity, the case conditions 
of (1) are mutually exclusive. Thus, it must be that y = h(x). Butthen, y = K(x) 
as well, since K satisfies (1) too. 

Let now Q(a),...,a;,x) and y = F(x). By (1), there are (unique, as 
we now know) 01,..., 0, such that 0; = F(a;) fori = 1,...,r, and y = 
go(x, 01,...,0,). By the LH., 0, = K(a;). But then (1) yields y = K(x) as 
well (since K satisfies (1)). 


<: Just interchange the letters F and K in the above argument. 


The above clearly is valid for functions h and gg that may fail to be defined 
everywhere in their “natural” input sets. To be able to have this degree of 
generality without having to state additional definitions (such as those of left 
fields, right fields, partial functions, total functions, nontotal functions, and 
Kleene weak equality) we have stated the recurrence (1) the way we did (to 


+ CI(7, #)-derivable. 
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keep an eye on both the input and output side of things) rather than the usual 
h(x) ifxe FJ 
go(x, f(a), SND) f(@)) if Oa, +++) r, x) holds, 


Of course, if all the gg and h are defined everywhere on their input sets (i.e., 
they are “total’”), then f is defined everywhere on Cl(7, .#) (see Exercise I.4). oe 


fos) =| 


I.3. Axioms and Rules of Inference 


Now that we have our language L, we will embark on using it to formally 
effect deductions. These deductions start at the axioms. Deductions employ 
“acceptable”, purely syntactic — i.e., based on form, not on substance — rules 
that allow us to write a formula down (to deduce it) solely because certain other 
formulas that are syntactically related to it were already deduced (i.e., already 
written down). These string-manipulation rules are called rules of inference. 
We describe in this section the axioms and the rules of inference that we will 
accept into our logical calculus and that are common to all theories. 

We start with a precise definition of tautologies in our first order language L. 


1.3.1 Definition (Prime Formulas in Wff. Propositional Variables). A for- 
mula.4 € Wff is a prime formula or a propositional variable iff it is either 


Pril. atomic or 
Pri2. a formula of the form ((Ax).4). 


We use the lowercase letters p,q, r (with or without subscripts or primes) to 


denote arbitrary prime formulas (propositional variables) of our language. 


That is, a prime formula either has no propositional connectives, or if it does, 
it hides them inside the scope of (Ax). 


We may think of a propositional variable as a “blob of ink” that is all that a 
myopic being makes out of a formula described in 1.3.1. The same being will 
see an arbitrary well-formed formula as a bunch of blobs, brackets and Boolean 
connectives (—, V), “correctly connected” as stipulated below.' © 


1.3.2 Definition (Propositional Formulas). The set of propositional formulas 
over /, denoted here by Prop, is the smallest set such that: 


(1) Every propositional variable (over 7’) is in Prop. 
(2) If.4 and .# are in Prop, then so are (=A) and(.4 Vv .#). 


1 Interestingly, our myope can see the brackets and the Boolean connectives. 
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1.3.3 Metatheorem. Prop = Wff. 


Proof. ©: We do induction on Prop. Every item in I.3.2(1) is in Wff. Wff 
satisfies I.3.2(2) (see I.1.8(b)). Done. 


>: We do induction on Wff. Every item in I.1.8(a) is a propositional variable 
(over /), and hence is in Prop. 

Prop trivially satisfies I.1.8(b). It also satisfies I.1.8(c), for if 4 is in Prop, 
then it is in Wff by the C-direction above. Then, by 1.3.1, ((Ax). 4) is a propo- 
sitional variable, and hence in Prop. 

We are done once more. 


1.3.4 Definition (Propositional Valuations). We can arbitrarily assign a value 
of 0 or 1 to every .4 in Wff (or Prop) as follows: 


(1) We fix an assignment of 0 or | to every prime formula. We can think of this 
as an arbitrary but fixed function v : {all prime formulas over L} — {0, 1} 
in the metatheory. 

(2) We define by recursion an extension of v, denoted by v: 


i((-.4)) = 1-34) 
B(.4 V.B)) = W4)- WA) 


“69 


where “-” above denotes number multiplication. 


We call, traditionally, the values 0 and 1 by the names “true” and “false” 
respectively, and write t and f respectively. 

We also call a valuation v a truth (value) assignment. 

We use the jargon “4 takes the truth value t (respectively, f) under a valu- 
ation v” to mean “v(.4) = 0 (respectively, t1(.4) = 1)”. 


The above inductive definition of v relies on the fact that Definition I.3.2 of 
Prop is unambiguous (1.2.10, p. 25), so that a propositional formula is uniquely 
readable (or parsable) (see Exercises I.5 and I.6). It employs the metatheorem 
on recursive definitions (I.2.13). 

The reader may think that all this about unique readability is just an annoying 
quibble. Actually it can be a matter of life or death. The ancient Oracle of 
Delphi had the nasty habit of issuing ambiguous — not uniquely readable, that 
is — pronouncements. One famous such pronouncement, rendered in English, 
went like this: “You will go you will return not dying in the war”.t Given that 


1 The original was “Iéeig apiéers ov Ovnzetc ev TOAELO”. 
l 
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ancient Greeks did not use punctuation, the above has two diametrically opposite 
meanings depending on whether you put a comma before or after “not”. 


The situation with formulas in Prop would have been as disastrous in the 
absence of brackets — which serve as punctuation — because unique readability 
would then not be guaranteed: For example, for three distinct prime formulas 
P.q,¥ wecould finda v such that v(p — g — r) depended on whether we meant 
to insert brackets around “p — q” or around “g — r” (can you find such a v?). © 


1.3.5 Remark (Truth Tables). Definition 1.3.4 is often given in terms of truth 
functions. For example, we could have defined (in the metatheory, of course) 
the function F_ : {t, f} > {t, f} by 


t ifx=f 


EG) = i ifx=t 


We could then say that t((—.4)) = F_(0(.4)). One can similarly take care of 
all the connectives (V and all the abbreviations) with the help of truth functions 
FY, Fy, F_,, F.,. These functions are conveniently given via so-called truth 
tables as indicated below: 


x y |] Fic) | Fu@,y) | Po. y) | Fo, y) | FoG, y) 
f f t f f t t 
f t t t f t f 
t f f t f f f 
t t f t t t t 


© 


1.3.6 Definition (Tautologies, Satisfiable Formulas, Unsatisfiable Formulas 
in Wff). A formula.4 € Wff (equivalently, in Prop) is a tautology iff for all 
valuations v, 0(.4) = t. 


We call the set of all tautologies, as defined here, Taut. The symbol taut ~Z 
says “Z is in Taut”. 

A formula .4 € Wff (equivalently, in Prop) is satisfiable iff for some 
valuation v, 0(.4) = t. We say that v satisfies 7. 


A set of formulas T is satisfiable iff for some valuation v, i(.4) = t for 
every .4 in I. We say that v satisfies T. 


© 
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A formula .4 € Wff (equivalently, in Prop) is unsatisfiable iff for all 
valuations v, 0(.4) = f. A set of formulas I’ is unsatisfiable iff for all valuations 
v, 0.4) =f for some .4 inT. 


“Satisfiable” and “unsatisfiable” are terms introduced here in the propositional 
or Boolean sense. These terms have a more complicated meaning when we 
decide to “see” the object variables and quantifiers that occur in formulas. 


1.3.7 Definition (Tautologically Implies, for Formulas in Wff). Let .4 and 
I’ be respectively any formula and any set of formulas (over L). The symbol 
T Etaut 4, pronounced “I tautologically implies 4”, means that every truth 


assignment v that satisfies I also satisfies .4. 


We have at once 


1.3.8 Lemma.! T tant 4 iff T U {4} is unsatisfiable (in the propositional 
sense). 


Iff = @, thenl’ Eqaut -4 says just Etaut -Z, since the hypothesis “every truth 
assignment v that satisfies I”, in the definition above, is vacuously satisfied. 
For that reason we almost never write @ -qaut Z and write instead Eqaut .4. 


1.3.9 Exercise. For any formula .4 and any two valuations v and v’, t(.4) = 
v’'(.4) if v and v’ agree on all the propositional variables that occur in. 7. 

In the same manner, [ Eqaut -4@ is oblivious to v-variations that do not 
affect the variables that occur in I and.#4 (see Exercise I.7). 


Before presenting the axioms, we need to introduce the concept of sub- 
stitution. 


1.3.10 Tentative Definition (Substitution of a Term for a Variable). Let 2 
be a formula, x an (object) variable, and ¢ a term. 

_@[x < t] denotes the result of “replacing” all free occurrences of x in.4 
by the term ft, provided no variable of t was “captured” (by a quantifier) during 


1 The word “lemma” has Greek origin, “A7jj1.@”, plural “lemmata” — many people say “lemmas” — 
from “Anupata”. It derives from the verb “AauBadvw” (to take) and thus means “taken thing”. 
In mathematical reasoning a lemma is a provable auxiliary statement that is taken and used as 
a stepping stone in lengthy mathematical arguments — invoked therein by name, as in “... by 
Lemma such and such...” — muchas subroutines (or procedures) are taken and used as auxiliary 
stepping stones to elucidate lengthy computer programs. Thus our purpose in having lemmata is 
to shorten proofs by breaking them up into modules. 


© 


© 
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substitution. If the proviso is valid, then we say that “t is substitutable for x 
(in. 4)”, or that “t is free for x (in. 4)”. 
If the proviso is not valid, then the substitution is undefined. 


1.3.11 Remark. There are a number of issues about Definition I.3.10 that need 
discussion or clarification. 

Any reasonable person will be satisfied with the above definition “as is”. 
However, there are some obscure points (deliberately quoted, above). 


(1) What is this about “capture”? Well, suppose that.4 = (4x)-x = y. Let 
t = x.| Then, if we ignore the provison in I.3.10,.4[y < t] = (Ax)-x = 
x, which says something altogether different than the original. Intuitively, 
this is unexpected (and undesirable): .4 codes a statement about the free 
variable y, i.e., a statement about all objects which could be values (or 
meanings) of y. One would have expected that, in particular,.4[y <— x]- 
if the substitution were allowed — would make this very same statement 
about the values of x. It does not. What happened is that x was captured 
by the quantifier upon substitution, thus distorting .4’s original meaning. 

(2) Are we sure that the term “replace” is mathematically precise? 

(3) Is.4[x < ft] always a formula, if 4 is? 


A revisitation of 1.3.10 via an inductive definition (by induction on terms 
and formulas) settles (1)—(3) at once (in particular, the intuitive terms “replace” 
and “capture” do not appear in the inductive definition). Here it goes: 


First off, let us define s[x <1], where s is also a term, by cases: 


t ifs =x 

a if s = a, a constant 
s[Ix<t]= (symbol) 

y if s = y,a variable £ x 


frnix<tiyrlx<t]...nie<t] ifs = fry...1, 


Pause. Is s[x < ft] always aterm? That this is so follows directly by induction 
on terms, using the definition by cases above and the I.H. that each of r;[x < 1], 
i=1l,...,n,is aterm. 


— 


Recall that in I.1.4 (p. 13) we defined the symbol “=” to be equality on strings. No further 
reminders will be issued. 


a 


And that is why the substitution is not allowed. The original formula says that for any object y 
there is an object that is different from it. On the other hand, .Z[y < x] says that there is an 
object that is different from itself. 
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We turn now to formulas. The symbols P, r, s (with or without subscripts) 
below denote a predicate of arity n, a term and a term (respectively). 


s[x<t]=r[x<t] if.4@=s=r 
Pri[x<tlr[x <—t]...rmn[x<t] if.4 = Pr,...rp 
(Bix <— t] vy @[x < t]) if. 4=(2V@) 
(“2 [x<— t])) if. 4 =(-Z) 
4A[x —t]l= 4.4 if.4 = ((dy).%) and 
iy. —e 
(Gy) 4x <_ t])) if. 4 = ((dy).%) and 


y #x and y does 
not occur in t 


In all cases above, the left hand side is defined iff the right hand side is. 


Pause. We have eliminated “replaces” and “captured”. Is though. 4[x < f]a 
formula (whenever it is defined)? (See Exercise I.8.) © 


1.3.12 Definition (Simultaneous Substitution). The symbols. 4[y,,..., y-<— 
ti,...,t,] or, equivalently, .4[}, <— t,] — where y, is an abbreviation of 
y1,---, ¥y — denote simultaneous substitution of the terms f,..., ¢, into the 
variables y,,..., y, in the following sense: Let z, be variables that do not occur 
at all (either as free or bound) in either .4 or f,. Then. 4 ly < t,] is short for 


vély, — 2)..-Dy <— 2Ilz1 << ti)... [er << (1) 


Exercise I.9 shows that we obtain the same string in (1) above, regardless of 
our choice of new variables Z,. 


More conventions: The symbol [x < f] lies in the metalanguage. This 
metasymbol has the highest priority, so that, e.g.,.4 V.#@[x < t] means 
AN (Blx <— t]), Ax).2[x < t] means (Ax)(.2[x < t]), etc. 


We often write.Z[y,..., y,], rather than the terse . 4, in order to convey 
our interest in the free variables y,,..., y, that may or may not actually appear 


free in 4, Other variables, not mentioned in the notation, may also be free in 
% (see also 1.1.11). 
In this context, if t,,..., t, are terms, the symbol. 4[t),..., t,] abbreviates 


We are ready to introduce the (logical) axioms and rules of inference. 
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Schemata.' Some of the axioms below will actually be schemata. A formula 
schema, or formula form, is a string ¥ of the metalanguage that contains syn- 
tactic variables (or metavariables), such as .4, P, f,a,t, x. 

Whenever we replace all these syntactic variables that occur in ‘¥ by speci- 
fic formulas, predicates, functions, constants, terms, or variables respectively, 
we obtain a specific well-formed formula, a so-called instance of the schema. 
For example, an instance of (4x)x = a is (Avj2)vj2 = O (in the language of 


Peano arithmetic). An instance of .Z4 > .4 is vj91 = 114 > Viol = V4I4. 


1.3.13 Definition (Axioms and Axiom Schemata). The logical axioms are all 
the formulas in the group Ax] and all the possible instances of the schemata in 
the remaining groups: 


Ax1. All formulas in Taut. 
Ax2. (Substitution axiom. Schema) 


A(x <— t] > €x)4 for any term f. 


By I.3.10-1.3.11, the notation already imposes a condition on f, that it is sub- 
stitutable for x. 


N.B. We often see the above written as 
A(t] > (Ax) 4[x] 
or even 


A(t] > Ax)%4 


Ax3. (Schema) For each object variable x, the formula x = x. 
Ax4. (Leibniz’s characterization of equality — first order version. Schema) For 
any formula . 4, object variable x, and any terms ¢ and s, the formula 


t=s>(.4[x < tl] o.4[x < s]) 


N.B. The above is written usually as 


t=s>(4[t] <.4[s]) 


as long as we remember that the notation already requires that t and s be free 
for x. We will denote the above set of logical axioms A. 


t Plural of schema. This is of Greek origin, o x ja, meaning — e.g., in geometry — figure or 
configuration or even formation. 


© 
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© The logical axioms for equality are not the strongest possible, but they are 
adequate for the job. What Leibniz really proposed was the schema t = s = 
(VP)(P[t] < P[s]), which says, intuitively, that “two objects ¢ and s are equal 
iff, for every property P, both have P or neither has P”. 

Unfortunately, our system of notation (first-order language) does not allow 
quantification over predicate symbols (which can have as “values” arbitrary 
“properties”). But is not Ax4 read “for all formulas.4” anyway? Yes, but with 
one qualification: “For all formulas .4 that we can write down in our system of 
notation’, and, alas, we cannot write all possible formulas of real mathematics 
down, because they are too many.! 

While the symbol “=” is suggestive of equality, it is not its shape that qual- 
ifies it. It is the two axioms, Ax3 and Ax4, that make the symbol behave as we 
expect equality to behave, and any other symbol of any other shape (e.g., 
Enderton (1972) uses “~”) satisfying these two axioms qualifies as formal 
equality that is intended to codify the metamathematical standard “=”. oe 


1.3.14 Remark. In Ax2 and Ax4 we imposed the condition that ¢ (and s) must 
be substitutable in x. Here is why: 


Take .4 to stand for (Vy)x = y and.# to stand for (Ay)—x = y. Then, tem- 
porarily suspending the restriction on substitutability, 4[x < y] > (Ax).4is 


(Vy)y =y > Ax)Vy)x = y 
andx =y > (Zo .4[x < y)) is 


x=y— (@y)-x =y @ Gy-y=y) 
neither of which, obviously, is “valid’’# 


There is a remedy in the metamathematics: That is, move the quantified 
variable(s) out of harm’s way, by renaming them so that no quantified variable 
in.@ has the same name as any (free, of course) variable in f (or s). 

This renaming is formally correct (i.e., it does not change the meaning of 
the formula) as we will see in the variant (meta)theorem I.4.13. Of course, 
it is always possible to effect this renaming, since we have countably many 
variables, and only finitely many appear free in ¢ (and s) and .#. 


+ Uncountably many, in a precise technical sense that we will introduce in Chapter VIL. This is 
due to Cantor’s theorem, which implies that there are uncountably many subsets of N. Each such 
subset A, gives rise to the formula, x € A, in the metalanguage. 

On the other hand, our formal system of notation, using just € and U as start-up (nonlogical) 
symbols, is not rich enough to write down but a countably infinite set of formulas (at some point 
later, Example VII.5.17, this will be clear). Thus, our notation will fail to denote uncountably 
many “real formulas” x € A. 

= Speaking intuitively is enough for now. Validity will be defined carefully pretty soon. 
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This trivial remedy allows us to render the conditions in Ax2 and Ax4 
harmless. Essentially, a ¢ (or s) is always substitutable after renaming. 


1.3.15 Definition (Rules of Inference). The following two are the only primi- 
tive’ rules of inference. These rules are relations with inputs from the set Wff 
and outputs also in Wff. They are written down, traditionally, as “fractions” 
through the employment of syntactic (or meta-) variables. We call the “numer- 
ator” the premise(s) and the “denominator” the conclusion. 


We say that a rule of inference is applied to any instance of the formula 
schema(ta) in the numerator, and that it yields (or results in) the corresponding 
instance’ of the formula schema in the denominator. 


Inf1. Modus ponens, or MP, is the rule 


Inf2. 4-introduction — pronounced E-introduction — is the rule 
(Ax). 4 > 2B 


that is applicable if a side condition is met: That x is not free in. 7. 


N.B. Recall the conventions on eliminating brackets. 


It is immediately clear that the definition above meets our requirement that the 
rules of inference be “algorithmic”, in the sense that whether they are applicable 
can be decided and their application can be carried out in a finite number of 
steps by just looking at the form of (potential input) formulas (not at their 
meaning). 


We next define I"-theorems, that is, formulas we can prove from the set of 
formulas I" (this [ may be empty). 


1.3.16 Definition (['-Theorems). The set of I-theorems, Thmr, is the least 
inclusive subset of Wff that satisfies 


Thi. A C Thmp (see I.3.13). 
Th2. 1 C Thmrp. We call every member of I" a nonlogical axiom. 
Th3. Thmry is closed under each rule Inf1-Inf2. 


1 That is, given initially. Other rules can be proved to hold, and we call them derived rules. 

The corresponding instance is the one obtained from the schema in the denominator by replac- 
ing each of its metavariables by the same specific formula, or term, used to instantiate all the 
occurrences of the same metavariable in the numerator. 
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The metalinguistic statement .4 € Thmr is traditionally written as T 4, 
and we say that .4 is proved from T or that it is a P-theorem. 

We also say that.4 is deduced from T, or that deduces .4. 

If l = G, then rather than @ + .4 we write + .4. We often say in this case 
that .4 is absolutely provable (or provable with no nonlogical axioms). 


We often write.4,.7,...,Y4 € for {.4,.Z,...,D}F €. 


1.3.17 Definition (['-Proofs). We just saw that Thmy = Cl(.7, .#), where.7 = 
A UF and.% contains just the two rules of inference. A (7, .#)-derivation is 
also called a I’-proof (or just proof, if C is understood). 


1.3.18 Remark. (1) Itis clear that if each of.4,,...,.4,, has a -proof and.# 
hasan {.4,,...,.4,,}-proof, then.# has a T’-proof. Indeed, simply concatenate 
each of the given I’-proofs (in any sequence). Append to the right of that 
sequence the given {.4,,...,.4,,}-proof (that ends with .#). Then the entire 
sequence is a I’-proof, and ends with .%. 


We refer to this phenomenon as the transitivity of F. 


Very important. Transitivity of F allows one to invoke previously proved 
(by oneself or others) theorems in the course of a proof. Thus, practically, a 
I’-proof is a sequence of formulas in which each formula is an axiom, is a 
known I’-theorem, or is obtained by applying a rule of inference to previous 
formulas of the sequence. 


(2)1ff C AandI + .4, then also A | .4,as follows from 1.3.16 or 1.3.17. 
In particular, .4 implies l + .4 for any I. 
(3) It is immediate from the definitions that for any formulas .4 and .7, 


4,76 > BEB (i) 
and, if, moreover, x is not free in.7, 


46> Br (Ax)4> B (ii) 


Some texts (e.g., Schiitte (1977)) give the rules in the format of (i)—-(ii) above. 


© 


The axioms and rules provide us with a calculus, that is, a means to “calcu- 
late” (used synonymously with construct) proofs and theorems. In the interest 
of making the calculus more user-friendly — and thus more easily applicable to 
mathematical theories of interest, such as set theory — we are going to develop in 
the next section a number of “derived principles”. These principles are largely 
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of the form .4,,...,.%4, / -@. We call such a (provable in the metatheory) 
principle a derived rule of inference, since, by transitivity of F, it can be used 
as a proof step in a I'-proof. By contrast, the rules Inf1—Inf2 are “primitive” 
(or “basic” or “primary’’); they are given outright. 


We can now fix our understanding of the concept of a formal or mathematical 
theory. 

A (first order) formal (mathematical) theory, or just theory over a language 
L, or just theory, is a tuple (of “ingredients”) T = (L, A, 1,.7), where L is a 
first order language, A is a set of logical axioms, Tis a set of rules of inference, 
and .7 a non-empty subset of Wff that is required to contain A (i.e., A C.7) 
and be closed under the rules I. 


Equivalently, one may simply require that.7 be closed under F, that is, 
for any T C.Y and any formula. 4, iff .4, then.4 €.7. 
This is, furthermore, equivalent to requiring that 
64es iff Fri (1) 


Indeed, the if direction follows from closure under F, while the only-if 
direction is a consequence of Definition 1.3.16. 


F is the set of the formulas of the theory, and we often say “a theory .7”’, 
taking everything else for granted. 

If .7 = Wff, then the theory T is called inconsistent or contradictory. 
Otherwise it is called consistent. 


Throughout our exposition we fix A and Yas in Definitions 1.3.13 and 1.3.15. 
By (1),.7 = Thm. This observation suggests that we call theories — such as 
the ones we have just defined — axiomatic theories, in that a set P always exists 
such that.7 = Thm (if at a loss, we can just take FP =.7 >). 


We are mostly interested in theories { for which there is a “small” set 
(“small” by comparison with .7) such that.7 = Thmr. We say that T is 
axiomatized by 1. Naturally, we call.7 the set of theorems, and I the set of 
nonlogical axioms, of &. 

If, moreover, I" is “recognizable” (i.e., we can tell “algorithmically” whether 
or not a formula. is in I’), then we say that T is recursively axiomatized. 

Examples of recursively axiomatized theories are ZFC set theory and Peano 
arithmetic. On the other hand, if we take .7 to be all the sentences of arithmetic 


1 As opposed to “of the language”, which is all of Wff. 
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that are true when interpreted “in the standard way’? over N — the so-called 
complete arithmetic — then there is no recognizable I such that.7 = Thmr. 
We say that complete arithmetic is not recursively axiomatizable.* 


Pause. Why does complete arithmetic form a theory? Because work of the next 
section — in particular, the soundness theorem — entails that it is closed under F. 


We tend to further abuse language and call axiomatic theories by the name 
of their (set of) nonlogical axioms I’. Thus if T = (L, A, 1.7) is a first order 
theory and.7 = Thmr, then we may say interchangeably “theory {”, “theory 
ZF”, or “theory TP”. 

If = @, then we have a pure or absolute theory (i.e., we are “just doing 
logic, not math”). If ! 4 @ then we have an applied theory. © 


Argot. A final note on language versus metalanguage, and theory versus meta- 
theory. When are we speaking the metalanguage, and when are we speaking 
the formal language? 


The answers are, respectively, “almost always” and “almost never’. As has 
been remarked before, in principle, we are speaking the formal language exactly 
when we are pronouncing or writing down a string from Term or Wff. Otherwise 
we are (speaking or writing) in the metalanguage. It appears that we (and 
everybody else who has written a book in logic or set theory) are speaking and 
writing within the metalanguage with a frequency approaching 100%. 

The formalist is clever enough to simplify notation at all times. We will 
seldom be caught writing down a member of Wff in this book, and, on the rare 
occasions we may do so, it will only be to serve as an illustration of why one 
should avoid writing down such formulas: because they are too long and hard 
to read and understand. 

We will be speaking the formal language with a heavy “accent” and using 
many idioms borrowed from “real” (meta-) mathematics, and English. We will 
call our dialect argot, following Manin (1977). 

A related, and practically more important, question is “When are we arguing 
in the theory, and when are we arguing in the metatheory?’. That is, the question 
is not about how we speak, but about what we are saying when we speak. 


1 That is, the symbol “0” of the language is interpreted as 0 € N, “Sx” as x + 1, “(Ax)” as “there 
isanx € N”, etc. 

The trivial “solution”, that is, taking [ =.7, will not do, for .7 is not recognizable. 

8 Important, because arguing in the theory restricts us to use only its axioms (and earlier proved 
theorems; cf. [.3.18) and its rules of inference — nothing extraneous to these syntactic tools is 
allowed. 
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The answer to this is also easy: Once we have fixed a theory % and the 
nonlogical axioms I’, we are working in the theory iff we are writing down a 
(['-) proof of some specific formula .4. It does not matter if 4 (and much of 
the what we write down during the proof) is in argot. 

Two examples: 


(1) One is working in formal number theory (or formal arithmetic) if one states 
and proves (say, from the Peano axioms) that “every natural number n > 1 
has a prime factor”. Note how this theorem is stated in argot. Below we 
give its translation into the formal language of arithmetic:i 


(¥n)(SO <n (Ax)\(Ay)(n =xx yA 1 
SO<xAWm)Vr)\(x =m xr—>m=S0Vm=x))) (1) 


(2) One is working in formal logic if one is writing a proof of (4v13)v13 = 043. 


Suppose though that our activity consists of effecting definitions, introduc- 
ing axioms, or analyzing the behaviour or capability of {, e.g., proving some 
derived rule.4,,...,.4,  # — that is, a theorem schema — or investigating 
consistency,' or relative consistency.’ Then we are operating in the metatheory, 
that is, in “real”? mathematics. 


oe One of the most important problems posed in the metatheory is 
“Given a theory ¥ and a formula. 7. Is.4 a theorem of £?” 


This is Hilbert’s Entscheidungsproblem, or decision problem. Hilbert believed 
that every recursively axiomatized theory ought to admit a “general” solution, 
by more or less “mechanical means”, to its decision problem. The techniques 
of Gédel and the insight of Church showed that this problem is, in general, 
algorithmically unsolvable. 


As we have already stated, metamathematics exists outside and indepen- 
dently of our effort to build this or that formal system. All its methods are — in 
principle — available to us for use in the analysis of the behaviour of a formal 
system. 


+ Well, almost. In the interest of brevity, all the variable names used in the displayed formula (1) 
are metasymbols. 

t That is, whether or not.7 = Wff. 

8 That is, “if I is consistent,” — where we are naming the theory by its nonlogical axioms — “does 
it stay so after we have added some formula. as a nonlogical axiom?”. 
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Pause. But how much of real mathematics are we allowed to use, reliably, to 
study or speak about the “simulator” that the formal system is?! 


For example, have we not overstepped our license by using induction (and, 
implicitly, the entire infinite set N), specifically the recursive definitions of 
terms, well-formed formulas, theorems, etc.? 

The quibble here is largely “political”. Some people argue (a major proponent 
of this was Hilbert) as follows: Formal mathematics was meant to crank out 
“true” statements of mathematics, but no “false” ones, and this freedom of 
contradiction ought to be verifiable. 

Now, as we are verifying so in the metatheory (1.e., outside the formal sys- 
tem), shouldn’t the metatheory itself be above suspicion (of contradiction, that 
is)? Naturally. 


Hilbert’s suggestion towards achieving this “above suspicion” status was, 
essentially, to utilize in the metatheory only a small fragment of “reality” that 
is so simple and close to intuition that it does not need itself any “certificate” 
(via formalization) for its freedom from contradiction. 


In other words, restrict the metamathematics!? 


Such a fragment of the metatheory, he said, should have nothing to do with 
the “infinite”, in particular with the entire set N and all that it entails (e.g., 
inductive definitions and proofs). 

If it were not for Gédel’s incompleteness results, this position — that meta- 
mathematical techniques must be finitary — might have prevailed. However, 
Gédel proved it to be futile, and most mathematicians have learnt to feel com- 
fortable with infinitary metamathematical techniques, or at least with N and 
induction.‘ Of course, it would be imprudent to use as metamathematical tools 
mathematics of suspect consistency (e.g., the full naive theory of sets). 


— 


The methods or scope of the metamathematics that a logician uses — in the investigation of some 
formal system — are often restricted for technical or philosophical reasons. 

Otherwise we would need to formalize the metamathematics — in order to “certify” it — and 
next the metametamathematics, and so on. For if “metaM” is to authoritatively check “M” for 
consistency, then it too must be consistent; so let us formalize ““metaM” and let ““metametaM” 


a 


check it; ... — a never ending story. 
8 See Hilbert and Bernays (1968, pp. 21-29) for an elaborate scheme that constructs “concrete 
number objects” — Ziffern or “numerals” — “|”,“||”’,“|||”, etc., that stand for “1”,‘2”,“3”, etc., 


complete with a “concrete mathematical induction” proof technique on these objects, and even 
the beginnings of their “recursion theory”. Of course, at any point, only finite sets of such objects 
were considered. 

1 Some proponents of infinitary techniques in metamathematics have used very strong words in 
describing the failure of Hilbert’s program. Rasiowa and Sikorski (1963) write in their intro- 
duction: “However Gédel’s results exposed the fiasco of Hilbert’s finitistic methods as far as 
consistency is concerned.” 
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It is worth pointing out that one could fit (with some effort) our inductive 
definitions within Hilbert’s style. But we will not do so. 

First, one would have to abandon the elegant (and now widely used) approach 
with closures, and use instead the concept of derivations of Section I.2. 

Then one would somehow have to effect and study derivations without the 
benefit of the entire set N. Bourbaki (1966b, p. 15) does so with his construc- 
tions formatives. Hermes (1973) is another author who does so, with his “term-” 
and “formula-calculi” (such calculi being, essentially, finite descriptions of 
derivations). 

Bourbaki (but not Hermes) avoids induction over all of N. In his metamath- 
ematical discussions of terms and formulas‘ that are derived by a derivation 
d,...,dn, he restricts his induction arguments to the segment {0, 1,..., 7}, 
that is, he takes an LH. on k < n and proceeds tok + 1. oe 


1.4. Basic Metatheorems 


We are dealing with an arbitrary theory = (L, A,I,.7), such that A is the 
set of logical axioms I.3.13 and I are the inference rules 1.3.15. We also let T° 
be an appropriate set of nonlogical axioms, i.e.,.7 = Thmr. 


1.4.1 Metatheorem (Post’s “Extended” Tautology Theorem). /f.4,,..., 
6, taut Z then .4,,...,.4, FB. 
Proof. The assumption yields that 

E taut 7, > ++: > 4, > B (1) 
Thus — since the formula in (1) is in A, and using Definition 1.3.16, 


4 14.++,6,F 46,317 6,7 B (2) 


Applying modus ponens to (2), n times, we deduce .%. 


oie is an omnipresent derived rule. © 


1.4.2 Definition. .4 and .7 are provably equivalent in & means that Tt 


1.4.3 Metatheorem. Any two theorems .~ and .% of T are provably equiva- 
lent in &. 


1 For example, in loc. cit., p. 18, where he proves that, in our notation,.4[x <— y] and t[x < y] 
are a formula and term respectively. 
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Proof. By 1.4.1, + .4 yields P + .2 > .4. Similarly, 2b 4 > FB 
follows from I + .#. One more application of 1.4.1 yields PF .4 @.#. 


©" 1X = xX <> ay = y (why?), but neither —x = x nor ~y = y is a Y-theorem. © 


1.4.4 Remark (Hilbert Style Proofs). In practice we write proofs “vertically”, 
that is, as numbered vertical sequences (or lists) of formulas. The numbering 
helps the annotational comments that we insert to the right of each formula that 
we list, as the following proof demonstrates. 

A metatheorem admits a metaproof, strictly speaking. The following is a 
derived rule (or theorem schema) and thus belongs to the metatheory (and so 
does its proof). 

Another point of view is possible, however: The syntactic symbols x, .4, 
and .2 below stand for a specific variable and specific formulas that we just 
forgot to write down explicitly. Then one can think of the proof as a (formal) 


Hilbert style proof. © 
1.4.5 Metatheorem (V-Introduction — Pronounced A-Introduction). If x 
does not occur free in %, then. 4 > BL 4G > (WXx).B. 
Proof. 

() #4 > 2 given 

(2) 7AB>A4 (1) and 1.4.1 

(3) (x)7AB->A4 (2) and 3-introduction 

(4) .4@é> 73€x)7F (3) and 1.4.1 

(5) .4@> (Wx).F (4), introducing the V-abbreviation 


1.4.6 Metatheorem (Specialization). For any formula .4 and term t, 
F (Wx).4 > Alt]. 


At this point, the reader may want to review our abbreviation conventions; in 


particular, see Ax2 (1.3.13). © 
Proof. 

(Q) —4[t] > @x)7.4 in A 

Q) 7€x34 > 41 (1) and 1.4.1 


(3) (Wx).4 > A[t] (2), introducing the V-abbreviation 
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1.4.7 Corollary. For any formula.4, + (Wx).4 => A. 


Proof. 4[x <—x]=.4. 


Pause. Why is.4[x < x] the same string as .4? 


1.4.8 Metatheorem (Generalization). For any T and any.4, if T + .4, then 
DE (Vx)A. 


Proof. Choose y # x. Then we continue any given proof of .4 (from I’) as 


follows: 
(1) .4 proved from I 
(2) y=y>.4 (1) and 1.4.1 
3) y=y—> (Wx).4 (2) and V-introduction 
(4) y=y in A 
(5) (Wx).4 (3), (4) and MP 


1.4.9 Corollary. For any T and any 4,0 + .4 iff TF (Wx).4. 


Proof. By 1.4.7, 1.4.8, and modus ponens. 


1.4.10 Corollary. For any .4, 4 + (Wx).@ and (VWx).4 4. 


The above corollary motivates the following definition. It also justifies the 
common mathematical practice of the “implied universal quantifier”. That is, 


” 


we often just state “...x...” when we mean “(Vx)...x...”. 


1.4.11 Definition (Universal Closure). Let y,,..., y, be the list of all free vari- 
ables of .4. The universal closure of 4 is the formula (Vy,)(Vy2) +--+ (Wyn) - 
often written more simply as (Vy, y2... Yn). or even (Vy). 4. 


@ by 1.4.10, a formula deduces and is deduced by its universal closure. © 


Pause. We said the universal closure. Hopefully, the remark immediately above 
is robust to permutation of (Vy,)(Vy2)--- (Wy,). Is it? (Exercise 1.10.) 


1.4.12 Corollary (Substitution of Terms). .4[x,,...,%,] | -4[h,..., ti] 
for any terms ty,..., th. 


@ The reader may wish to review I.3.12 and the remark following it. © 
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Proof. We illustrate the proof for n = 2. What makes it interesting is the re- 
quirement to have “simultaneous substitution”. To that end we first substitute 
into x; and x2 new variables z, w —i.e., not occurring in either . 4 or in the f;. 
The proof is the following sequence. Comments justify, in each case, the pres- 
ence of the formula immediately to the left by virtue of the presence of the 
immediately preceding formula: 


(x1, x2] starting point 
(Vx1).4[x1, x2] generalization 
4[z, x2] specialization; x; < z 
(Wx2).4 [z, xo] generalization 
4[z, w] specialization; x. <— w 


Now z <-f|, w <h, in any order, is the same as simultaneous substitu- 
tion 1.3.12: 


(Vz). 4@[z, w] generalization 
4[t, w] specialization; z < ty 
(Vw). 4[t, w] generalization 
(ty, tr] specialization; w < tf, 


1.4.13 Metatheorem (Variant, or Dummy, Renaming). For any formula 
(Ax).4, ifz does not occur in it (i.e., is neither free nor bound), then (Ax).4 <= 
(az). 4[x<z]. 


We often write this (under the stated conditions) as (Ax).4[x] <= (dz). 4[z]. 
By the way, another way to state the conditions is “if z does not occur in .4 
(i.e., is neither free nor bound in .4), and is different from x’. Of course, if 
Zz = x, then there is nothing to prove. © 


Proof. Since z is substitutable in x under the stated conditions, .4[x< z] is 
defined. Thus, by Ax2, 
+ 4[x<z] > (Ax).4 
By i-introduction — since z is not free in (Ax). 4 — we also have 
F (Az). 4[x<z] > (Ax). 4 (1) 


We note that x is not free in (4z).4[x< z] and is free for z in. 4[x< z]. 
Indeed, .4 [x < z][z< x] =.4. Thus, by Ax2, 


+ .4-—-> (az). 4[x< Zz] 
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Hence, by 3-introduction 


F (Ax).4 > (Az). 4[x<z] (2) 


Tautological implication from (1) and (2) concludes the argument. 


Why is .4[x< z][z<-x] = .4? We can see this by induction on .4 (recall 
that z occurs as neither free nor bound in. 4). 

If .4 is atomic, then the claim is trivial. The claim also clearly “propagates” 
with the propositional formation rules, that is, I.1.8(b). 

Consider then the case that.4 = (Aw).%. Note that w = x is possible 
under our assumptions, but w = z is not. If w = x, then. 4[x< z] =.4; in 
particular, z is not free in. 4; hence .4[x< z][z< x] =.4@ as well. 

So let us work with w # x. By the L.H., .@[x< z][z<x] =.2. Now 


Alx<—z]lz <— x] = (Gw).A)[x < z][z<— x] 
= ((Aw).2[x< z)[z<— x] see 1.3.11; w 4 z 
= (Aw). A[x< z][z<x]) see I.3.11;w#x 
= ((Aw)#) LH. 


By 1.4.13, the issue of substitutability becomes moot. Since we have an infinite 
supply of variables (to use, for example, as bound variables), we can always 
change the names of all the bound variables in .# so that the new names are 
different from all the free variables in .4 or t. In doing so we obtain a formula 
.# that is (absolutely) provably equivalent to the original. 

Then.“ [x < tf] willbe defined (t will be substitutable in x). Thus, the moral 
is “any term ¢ is free for x in. 4 after an appropriate ‘dummy’ renaming”. © 


1.4.14 Definition. In the sequel we will often discuss two (or more) theories at 
once. Let = (L, A, I,.7 ) and & = (L’, A, I,.7’’) be two theories such that 
Y <Y". This enables { to be “aware” of all the formulas of { (but not 
vice versa, since L’ may contain additional nonlogical symbols — case where 
VAP"). 

We say that {’ is an extension of {, in symbols T < &', iff. 7 C.7". 

Let .4 be a formula over L (so that both theories are aware of it). The 
symbols Fz .4 and tz .4 are synonymous with.4 € .7 and.4 € 7" 
respectively. 

Note that we did not explicitly mention the nonlogical axioms I or I’ to the 
left of F, since the subscript of | takes care of that information. 


We say that the extension is conservative iff for any .4 over L, whenever 
Fg, .@ it is also the case that |< .4. That is, when it comes to formulas over 
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the language (L) that both theories understand, then the new theory does not 
do any better than the old in producing theorems. 


1.4.15 Metatheorem (Metatheorem on Constants). Let us extend a language 
L of a theory & by adding new constant symbols €,,..., €n to the alphabet 7, 
resulting in the alphabet Y, language L’, and theory &'. Furthermore, assume 
that 1’ =T, that is, we did not add any new nonlogical axioms. 

Then te .4[e1,...,€n] implies tz -4[x1,...,%n], for any variables 


X1,...,X, that occur nowhere in .~4[e1,...,@n], as either free or bound 
variables. 
Proof. Fix a set of variables x;,..., X, as described above. We do induction on 


¥’-theorems. 


Basis..4[e1,..., €n]isalogical axiom (over L’); hence sois.4[x1,..., Xn], 
over L — because of the restriction on the x;. Thus Fz .4[x,..., x, ]. Note 
that if .4[e1,...,é,] is nonlogical, then so is .4[x1,...,X,] under our 
assumptions. 


Pause. What does the restriction on the x; have to do with the claim above? 


Modus ponens. Here fe .#[e},..-,€n] > -4ler,.--,e,] and Fe 
Bley,...,€n,]. By LH, Fe #ly,---, yn] > -@b1,---,¥,] and Fe 


Blyi,.+-,¥n], where y,,...,y, occur nowhere in .#[e),...,e,] > 
_@[e1,...,@n] as either free or bound variables. By modus ponens, <¢ 


é[y1,---, Vn]; hencelFg -4[x1,..., X,] by 1.4.12 (and 1.4.13). 


d-introduction. We havel¢.#[e1,...,én] > @[e1,..-, en], zis not free in 
@ [e1,..-,én],and.4[e),..., en] = (z).Fle1,..-,en] > @lei,..-, en]. By 
the LH., if w),..., w, — distinct from z — occur nowhere in .7[e1,..., @n] > 
@ [e1,...,n] as either free or bound, then we get F< .Z[w1,..., Wa] > 


@[w1,..., Wn]. By i-introduction we get F< (dz).Z2[w1,...,Wn] > 
@[w1,..., Wn]. By 14.12 and 14.13 we get Fe (Az).#8[x1,...,%n] > 
@ [x1,...,Xn], 1e., Fe Glx1,..., Xn]. 


1.4.16 Corollary. Let us extend a language L of a theory & by adding new 
constant symbols €,,..., n to the alphabet 7, resulting in the alphabet 7", 
language L', and theory &'. Furthermore, assume that Y’ = 1, that is, we did 
not add any new nonlogical axioms. 

Then .4[e,..-, en] ffs -4[11,..., Xn] for any choice of variables 


X1,-2+5Xye 


1.4. Basic Metatheorems 49 


Proof. If part. Trivially, Fz 4[x1,..., Xn] implies Fe .4[x1,..., Xn]; hence 
Fe 4[e1,..., en] by 14.12. 

Only-if part. Choose variables y1,..., y, that occur nowhere in.4[e1,..., 
é, | as either free or bound. By 1.4.15, Fz .4[y1,..-., yn]; hence, by 1.4.12 and 
1.4.13, Fs 4[x1,..., Xn]. 


1.4.17 Remark. Thus, the extension {’ of T is conservative; for, if 4 is over 
L, then .4[e,...,@,] = .%. Therefore, if F< .4, then Fs .4[e1,..., €n]3 
hence Fy .4[x1,..., Xp], that is, Fz 4. 

A more emphatic way to put the above is this: &’ is not aware of any new 


nonlogical facts that { did not already know, albeit by a different name. If 
YT’ can prove .4[e1,..., en], then { can prove the same statement, using any 
names (other than the e;) that are meaningful in its own language; namely, it 
can prove .4[x,,..., Xn]. 


The following corollary stems from the proof (rather than the statement) 
of 1.4.15 and 1.4.16, and is important. 


1.4.18 Corollary. Let e,,..., @, be constants that do not appear in the nonlog- 
ical axioms ’. Then, if x,,..., X, are any variables, and ifl + .4[e,,..., én], 
it is also the case that | .4[x,..., Xn]. 


1.4.19 Metatheorem (The Deduction Theorem). For any closed formula .4, 
arbitrary formula .2, and set of formulas T, if 0 +.44.%, then ThE 


N.B. T +.4 denotes the augmentation of I by adding the formula .4. 
In the present metatheorem . 4 is a single (but unspecified) formula. However, 
the notation extends to the case where .4 is a schema, in which case it means 
the augmentation of I’ by adding all the instances of the schema. 

A converse of the metatheorem is also true trivially: That is, 7.4 > 2 
implies [ +.4 + .&. This direction immediately follows by modus ponens 
and does not require the restriction on .4. 


Proof. The proof is by induction on + .4-theorems. 

Basis. Let .2 be logical or nonlogical (but, in the latter case, assume 
B#.%).Thenl + .%. Since .2 Emu 4 > -%, it follows by 14.1 that 
Th .4 5.2%. 

Now, if 2 = .4, then.4 —> .# is a logical axiom (group Ax1); hence 
Tt .4 >. once more. 
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Modus ponens. Let? +.4+ @and?+.4+ © > .4. By LH, PF 
4—> Gandl+.4 > € > #.Sincee.4 > 6,.4 > € > LB Erout 
4 —> 2,wehaveDb .4 > #. 

d-introduction. Let +.4+ @ > GY and.@ = (Ax) > Y, where x is 
not free in Y. By the LH., 7 +. 4 > @ > Y.Byl41,P-F> 45> YY; 
hence [ + (Ax) > .4 > YF by F-introduction (.4 is closed). One more 
application of 1.4.1 yields [ .4 > (Ax)@ > J. 


1.4.20 Remark. (1) Is the restriction that..4 must be closed important? Yes. 
Let.4 = x = a, where “a” is some constant. Then, even though.4 - (Vx).4 
by generalization, it is not always true’ thatt .4 — (Vx).#. This follows from 
soundness considerations (next section). Intuitively, assuming that our logic 
“doesn’t lie” (that is, it proves no “invalid” formulas), we immediately infer 
that x = a — (Vx)x = a cannot be absolutely provable, for it is a “lie”. It fails 
at least over N, if a is interpreted to be “0”. 


(2) 1.4.16 adds flexibility to applications of the deduction theorem: 


Fe (4 >. #)[X1,...5Xn] (x) 
where [x,,...,%X,] is the list of all free variables just in .Z, is equivalent 
(by 1.4.16) to 

be (4 > #y)fe1,...,en] (**) 
where €),..., @, are new constants added to W (with no effect on nonlogical 
axioms: [ = I’). Now, since. 4[e1,..., @,] is closed, proving 


I’ + 4[e,...,@]b Ble,..., en] 


establishes (>), hence also («). 

In practice, one does not perform this step explicitly, but ensures that, 
throughout the [ + .4-proof, whatever free variables were present in .4 
“behaved like constants”, or, as we also say, were “frozen”. 

(3) In some expositions the deduction theorem is not constrained by requiring 
that .Z be closed (e.g., Bourbaki (1966b), and more recently Enderton (1972)). 


Which version is right? Both are, in their respective contexts. If all the 
primary rules of inference are “propositional” (e.g., as in Bourbaki (1966b) and 
Enderton (1972), who only employ modus ponens) — that is, these rules do not 
meddle with quantifiers — then the deduction theorem is unconstrained. If, on 
the other hand, full generalization, namely, .4 + (Vx). 4, is a permissible rule 
(primary or derived), then one cannot avoid constraining the application of the 


t That is, it is not true in the metatheory that we can prove .4 > (Wx).4 without nonlogical 
axioms (absolutely). 
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deduction theorem, lest one want to derive (the invalid)  .4 —> (Vx).4 from 
the valid .4 F (Vx)... 

This also entails that approaches such as in Bourbaki (1966b) and Enderton 
(1972) do not derive full generalization. They only allow a weaker rule, “if 
+ 4, then  (Vx).4”.1 


(4) This divergence of approach in choosing rules of inference has some 
additional repercussions. One has to be careful in defining the semantic counter- 
part of F, namely, — (see next section). One wants the two symbols to track 
each other faithfully (Gédel’s completeness theorem). © 


1.4.21 Corollary (Proof by Contradiction). Let.7 be closed. Then’ + .4 
iff 1 + —.4 is inconsistent. 


Proof. If part. Given that Thmr,—. 4 = Wff. In particular, © + —.4 + .4. By 
the deduction theorem, [ + —.4 > .4. But.4 > .4 Egat 74. 

Only-if part. Given that [T | .4. Hence [ + —.4 + .# as well (recall 
1.3.18(2)). Of course, l + =.4 | =.4 too. Since.4,—.4 Eqaut -% for an 
arbitrary .2, we are done. 


Pause. Is it necessary to assume that .4 is closed in 1.4.21? Why? 


The following is important enough to merit stating. It follows from the type 
of argument we employed in the only-if part above. 


1.4.22 Metatheorem. { is inconsistent iff for some .4, both ts .4@ and 
Fe 3.4 hold. 


We also list below a number of quotable proof techniques. These techniques 
are routinely used by mathematicians, and will be routinely used by us in what 
follows. The proofs of all the following metatheorems are delegated to the 
reader. 


1.4.23 Metatheorem (Distributivity or Monotonicity of 3). Foranyx,.7,.%, 


46> Bt (Ax)4 > (Ax).F 


Proof. See Exercise I.11. 


T Indeed, they allow a bit more generality, namely, the rule “if [ + .4 with a side condition, then 
T+ (Vx). 4%. The side condition is that the formulas of I’ do not have free occurrences of x”. Of 
course, I can always be taken to be finite (why?), so that this condition is not unrealistic. 

= In Mendelson (1987) = is defined inconsistently with F. 
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1.4.24 Metatheorem (Distributivity or Monotonicity of V). Foranyx,.4, 2, 


> BE Wx)4 > (Wx).B 


Proof. See Exercise 1.12. 


The term “monotonicity” is inspired by thinking of “—” as “<”. How? Well, 
we have the tautology 


(4 >.B)<(AVB<B) (i) 


If we think of “4 V.%” as “max(. 4, 7)”, then the right hand side in (7) above 
says that.% is the maximum of. 4 and.#. Or that .4 is “less than or equal to” 
.2#. The above metatheorems say that both J and V preserve this “inequality”. 


1.4.25 Metatheorem (The Equivalence Theorem, or Leibniz Rule). Let T & 
6 < #, and let €' be obtained from @ by replacing some — possibly, 
but not necessarily, all — occurrences of a subformula .4 of & by 2. Then 
TKS 3 @', i.e, 

42> 2B 


Eo G'! 


is a derived rule. 


Proof. The proof is by induction on formulas @. See Exercise I.14. 


Equational or calculational predicate logic is a particular foundation of first 
order logic that uses the above Leibniz rule as the primary rule of inference. 
In applying such logic one prefers to write proofs as chains of equivalences. 
Most equivalences in such a chain stem from an application of the rule. See 
Dijkstra and Scholten (1990), Gries and Schneider (1994), Tourlakis (2000a, 
2000b, 2001). © 


1.4.26 Metatheorem (Proof by Cases). Suppose that F .4,V--- VG 
andV + .4, > .@ fori =1,...,n. ThenTE.&. 


n? 


Proof. Immediate, by 1.4.1. 


Proof by cases usually benefits from the application of the deduction theorem. 
That is, having established P + .4, v --- V.4,,, one then proceeds to adopt, 
in turn, each.4, = 1,...,m) as a new nonlogical axiom (with its variables 
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“frozen”). In each case (.4;) one proceeds to prove .#. At the end of all this 
one has established P+ .#. 


In practice we normally use the following argot: 


“We will consider cases 4; fori=1,...,n. 
Case.7%,.  ... therefore, 7.7 
Case.%,. ... therefore, .7.” 


1.4.27 Metatheorem (Proof by Auxiliary Constant). Suppose that for formu- 
las.4 and # over the language L we know 


(1) PF @x).4 [x], 

(2) 1. +.4[a] + 2, where a is a new constant not in the language L of T. 
Furthermore assume that in the proof of .% all the free variables of .4[a] 
were frozen. 


ThenTE.&. 


Proof. Exercise 1.18. 


The technique that flows from this metatheorem is used often in practice. For 
example, in projective geometry axiomatized as in Veblen and Young (1916), in 
order to prove Desargues’s theorem on perspective triangles on the plane we use 
some arbitrary point (this is the auxiliary constant) off the plane, having verified 
that the axioms guarantee that such a point exists. It is important to note that De- 
sargues’s theorem does not refer to this point at all — hence the term “auxiliary”. 


Note. In this example, from projective geometry, “.7” is Desargues’s theorem, 
“(Ax).4[x]’ asserts that there are points outside the plane, a is an arbitrary such 
point, and the proof (2) starts with words like “Let a be a point off the plane” — 
which is argot for “add the axiom .4[a]”. 


1.5. Semantics 


So what do all these symbols mean? We show in this section how to decode 
the formal statements (formulas) into informal statements of real mathematics. 
Conversely, this will entail an understanding of how to code statements of real 
mathematics in our formal language. 


+ That is, we add the axiom .4 , tol’, freezing its variables, and we then prove .7. 
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The rigorous’ definition of semantics for first order languages is due to 
Tarski and is often referred to as “Tarski semantics”. The flavour of the particular 
definition given below is that of Shoenfield (1967), and it accurately reflects our 
syntactic choices — most importantly, the choice to permit “full” generalization 
@ + (Vx).4. In particular, we will define the semantic counterpart of +, name- 
ly, E, pronounced “logically implies’, to ensure that [ + .4 iff Pf E .4. This 
is the content of Gédel’s completeness theorem, which we state without proof 
in this section (for a proof see, e.g., our volume 1, Mathematical Logic). 


This section will assume some knowledge of notation and elementary facts 
from Cantorian (naive) set theory. We will, among other things, make use of 
notation such as 


A” (or A x --- x A) 
— 


n times 


for the set of ordered n-tuples of members of A. We will also use the symbols 
&, U, Wren 


1.5.1 Definition. Given a language L = (7, Term, Wff), a structure IN = 
(M, .7) appropriate for L is such that M 4 @ is a set (the domain or underlying 
set or universe’) and .7 (“.7” for interpretation) is a mapping that assigns 


(1) to each constant a of Y a unique member a? EM, 

(2) to each function f of 7 — of arity n — a unique (total)‘ function f7 : 
M" > M, 

(3) to each predicate P of 7 — of arity n —a unique set P”? C M"# 


1.5.2 Remark. The structure 22 is often written more verbosely, in conformity 
with practice in algebra. Namely, one unpacks the .7 into a lista”, b”, ...; f°”, 
rag 1.3 P7, QO”, ... and writes instead St = (M;a’, b?,...; i. ae sce 


— 


One often says “The formal definition of semantics ...”, but the word “formal” is misleading 
here, for we are actually defining semantics in the metatheory (in “real” mathematics), not in 
some formal theory. 

If we have a set of sets {Sz, Sp, Sc, ...}, where the indices a, b,c,... all come out of an index 
set J, then the symbol ();_, 5; stands for the collection of all those objects x that are found in at 
least one of the sets S;. It is a common habit to write 72 5S; instead of Ujen Si. A U B is the 
same as Uiet,23 S;, where we have let Sj = A and Sp = B. 

Often the qualification “of discourse” is added to the terms “domain” and “universe”. 
Requiring f 7 to be total is a traditional convention. By the way, total means that f7 is defined 
everywhere on M”. 

Thus P? is an n-ary relation with inputs and outputs in M. 


a 


= wm 
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P’, Q’,...). Under this understanding, a structure is an underlying set (uni- 
verse), M, along with a list of “concrete” constants, functions, and relations 
that “interpret” corresponding “abstract” items of the language. 


Under the latter notational circumstances we often use the symbols a™, f™, 
P™ (rather than a7, etc.) to indicate the interpretations in 9M of the constant a, 
function f, and predicate P respectively. 


We have said above “structure appropriate for L”, thus emphasizing the 
generality of the language and therefore our ability to interpret what we say in 
it in many different ways. 

Often though (e.g., as in formal arithmetic and set theory), we have a structure 
in mind to begin with, and then build a formal language to formally codify 
statements about the objects in the structure. Under these circumstances, in 
effect, we define a language appropriate for the structure. We use the symbol 
Lom to indicate that the language was built to fit the structure SJt. 


1.5.3 Definition. We routinely add symbols to a language L (by adding new 
nonlogical symbols) to obtain a language L’. We say that L’ is an extension of 
L and that L is a restriction of L'. Suppose that IN = (M,.7) is a structure 
for L, and let It’ = (M,.7’) be a structure with the same underlying set M, 
but with .7 extended to .7’ so that the latter gives meaning to all new symbols 
while it gives the same meaning as .7 does to the symbols of L. 

We call 93’ an expansion (rather than extension) of SM, and SM a reduct 
(rather than restriction) of Jt’. We often write .7 = .7’ | L to indicate that the 
mapping .7’ — restricted to L (symbol “!”’) — equals .7. 


1.5.4 Definition. Given L and a structure It = (M,.7) appropriate for L. 
L(9N) denotes the language obtained from L by adding to Y a unique new 
name i for each object i € M. 

This amends the sets Term, Wff into Term()t), Wff(2)t). Members of the 
latter sets are called SNt-terms and Syt-formulas respectively. 


‘ BG ; 
We extend the mapping .7 to the new constants by: i =i for allie M 
(where the “=” here is metamathematical: equality on M). 


All that we have done here is to allow ourselves to do substitutions like [x < i] 
formally. We do instead [x < i]. One next gives “meaning” to all closed terms 
in L(M). The following uses definition by recursion (1.2.13) and relies on the 
fact that the rules that define terms are unambiguous. 


© 
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1.5.5 Definition. For closed terms t in Term(90) we define the symbol t” € M 
inductively: 


(1) If ¢ is either a (original constant) or i (imported constant), then t7 has 
already been defined. 

(2) If ¢ is the string ft)...t,, where f is n-ary and t,...,t, are closed 
M-terms, we define t” to be the object (of M) f F007, vias Gs 


Finally, we give meaning to all closed )t-formulas, again by recursion (over 
Wif). 


1.5.6 Definition. For any closed formula. 4 in Wff())t) we define the symbol 
_%7 inductively. In all cases,. 47 € {t, f}: 


(1) If.4 = t = s, where ¢ and s are closed IN-terms, then.4” = t iff 
t? =s7. (The last two occurrences of “=” are metamathematical.) 

(2) If.4 = Pt,...t,, where P is an n-ary predicate and the #; are closed 
M-terms, then. 47 = t iff (t/,...,t/) € P? or P7(ty,...,t; ) holds. 
(Or “is true”; see p. 20. Of course, the last occurrence of “=” is meta- 
mathematical.) 

(3) If .4 is any of the sentences .7,.2 v %, then .47 is determined by 
the usual truth tables (see p. 31) using the values .#7 and %”. That is, 
(AB)? = F(#7)and(Bv @)’ = F(A, &7). (The last two occur- 
rences of “=” are metamathematical.) 

(4) If.4 = (Ax)%, then. 47 = tiff (Ax < i])” =t for somei € M. 
(The last two occurrences of “=” are metamathematical.) 


We have “imported” constants from M into L in order to be able to state 
the semantics of (4x).# above in the simple manner we just did (following 
Shoenfield (1967)). 

We often state the semantics of (Ax).% by writing 


((Ax).ZIx])” istrue iff (ai € M)\.Z{i])” is true 


1.5.7 Definition. Let.4 ¢ Wff, and Jt be a structure as above. 


An N-instance of 4 is an Mt-sentence .4(i,,..., iz) (that is, all the free 
variables of .4 have been replaced by imported constants). 

We say that .4 is valid in MN, or that IN is a model of .4, iff for all M- 
instances. 4’ of .Z it is the case that. 4’” = t.i Under these circumstances we 
write Eon 4. 


+ We henceforth discontinue our pedantic “(The last occurrence of “=” is metamathematical.)”. 


© 
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For any set of formulas T from Wff, Eon I’, pronounced “St is a model of 


I”, means that Foy .4 for all. 4 €T. 


A formula . 4 is universally valid or logically valid (we often say just valid) 


iff every structure appropriate for the language is a model of . 4. 


Under these circumstances we simply write F .4. 


If I is a set of formulas, then we say it is satisfiable iff it has a model. It is 


finitely satisfiable iff every finite subset of P has a model. 


Contrast the concept of satisfiability here with that of propositional satisfiabil- 
ity (1.3.6). The definition of validity of .4 in a structure 9M corresponds with the 
normal mathematical practice. It says that a formula is true (in a given “context” 


9M) just in case it is so for all possible values of the free variables. 


1.5.8 Definition. We say that I logically implies .4, in symbols T - .4, to 


mean that every model of T is also a model of 4. 


1.5.9 Definition (Soundness). A theory (identified by its nonlogical axioms) 
I’ is sound iff, for all.4 € Wff, P / —4 implies [ — .4, that is, iff all the 


theorems of the theory are logically implied by the nonlogical axioms. 


Clearly then, a pure theory © is sound iff F< .4 implies — .4 for all.4 € 


Wff. That is, all its theorems are universally valid. 


Towards the soundness result! below we look at two tedious (but easy) 


lemmata. 


1.5.10 Lemma. Given a term t, variables x # y, where y does not occur int, 


and a constant a. Then, for any term s and formula .%, s{x <— t][y < a] 


sly <a]lx < t] and. 4[x < t]Ly <a] =.4[y < al][x < 


Proof. Induction on s: Basis: 


ifs =x 
s[x<t]ly<a]= xy 

ifs =z,wherex #zF# y, 

ifs =b 


=sl[y<al]l[x <1] 


+ These two concepts are often defined just for sentences. 
Also nicknamed “the easy half of Gédel’s completeness theorem”. 


t]. 


then ¢ 
then a 
then z 
then b 


© 


© 


58 I. A Bit of Logic: A User’s Toolbox 


For the induction step let s = fr, ...r,, where f has arity n. Then 


s[x <— t]ly <a] = frilk <— t]ly <—a]...rm[x <— t]ly <a] 
=frniy<al[x<t]...rnly<al[x<t] by LH. 
= sly <—al][x <— t] 


Induction on 4: Basis: 


if. 4 = Pr,...r, then 
Pr[x<t]ly<a])...nie<ty<al= 
Prily<al[x <1t]...mby <a][x < ¢] 
if.4 =r=s then 
rix<ft]ly<al=s[x<t]ly<al= 
rly<al[x<t]=sly<a]l[x <1] 


4 < tly <al= 


=.4[y<a]l[x < ¢] 


The property we are proving, trivially, propagates with Boolean connectives. 
Let us do the induction step just in the case where. 4 = (Aw).%. If w = x or 
w = y, then the result is trivial. Otherwise, 


[x <— tlly — a] = (Gw).A)ix < t]Ly < a] 
= ((Aw).7[x < t]Ly < a]) 
= (w).ALy < al[x < ¢]) by LH. 
= ((Aw).Z)[y < a][x < 1] 
=.4[y<al[x < ft] 


1.5.11 Lemma. Given a structure IN = (M,.7), a term s and a formula 4%, 
both over L(N). Furthermore, each of s and.# have at most one free variable, 
namely, x. 

Let t be a closed term over L(IMN) such that t7 =i € M. Then (s[x < t])”7 = 
(s[x < i])7 and (.4[x < t])” = (4[x < i])”. Of course, since t is closed, 
4 [x < t] is defined. 


Proof. Induction on s: Basis: s[x <— t] = s if s € {y,a, j} (y # x). Hence 
(s[x < t])7 = s7 =(s[x <—i])7 in this case. If s = x, then s[x < ¢] =f and 
s[x < i] = i, and the claim follows once more. 

For the induction step let s = fr; ...r,, where f has arity n. Then 


Ghai) =F (Geet Ge Sp) 


= £7 (1 [x va il)”, oe (rp[x a ip) by LH. 
= (s[x < i])” 
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Induction on .~: Basis: If 4 = Pr, ...rp, thent 
(4[x <— t)? = P7((rilx < 1))’,..., Gale < 1])”) 
= P7((ri[x — i])”,..., nlx <— i))”) 
= (.4[x < i])” 
Similarly if.4 =r=s. 
The property we are proving, clearly, propagates with Boolean connectives. 
Let us do the induction step just in the case where. 4 = (Aw).#%. If w = x the 


result is trivial. Otherwise, we note that — since t is closed — w does not occur 
in t, and proceed as follows: 


(4x <1)? =t iff ((w).A)[x <1)” =t 
iff ((Gw).4lx <1)” = 
iff (Alx < t][w < j])” =t for some j € M, by 1.5.6(4) 
iff (@lw<— jlix< t])” = t for some j eM, by 1.5.10 
iff ((Alw <-j)k< r))” = tforsome j ¢ M 
iff ((Alw— FD < i))” = = t for some j € M, by LH. 
iff (Aw < j][x —i])” =t for some j ¢ M 
iff (Aix <—i][w< i)” = t for some j e M, by 1.5.10 
iff ((Gw).Alx — Fi) = t by L5.6(4) 
iff ((Gw).A)[x <i)” =t 
iff (4[x <i])” =t 


1.5.12 Metatheorem (Soundness). Any first order theory (identified by its non- 
logical axioms) T, over some language L, is sound. 


Proof. By induction on ’-theorems .4, we prove that [ — . 4. That is, we fix 
a structure for L, say It, and assume that yy I’. We then proceed to show that 
Eon 7%. 

Basis: If .4 is a nonlogical axiom, then our conclusion is part of the as- 
sumption, by 1.5.7. 

If .4 is a logical axiom, there are a number of cases: 


Case 1. — taut -4. We fix an -instance of .4, say 4’, and show that 
7 =t. Let P1,--+; Pn be all the propositional variables (alias, 
prime formulas) occurring in .4’. Define a valuation v by setting 


+ For ametamathematical relation Q, as usual (p. 20), O(a, b, ...) = t, or just O(a, b, ...), stands 
for (a,b,...) € Q. 
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v(pi) = py fori=1,...,n. Clearly, t=v(-4') = 67 (the first “=” 
because taut 7’, the second because after prime formulas have been 
taken care of, all that remains to be done for the evaluation of .4’ 7 is 
to apply Boolean connectives — see I.5.6(3)). 


Pause. Why is E taut 4’? 


Case 2..4 = #[t] > (Ax).%. Again, we look at an I-instance .7’[t’] > 
(Ax).7’. We want (.Z’[t'] > (Ax).7’)7 = t, but suppose instead that 


(A'[t')y? =t (1) 
and 
((av).Z’)’ =f (2) 


Let 1’7 = i (i € M). By15.11 and (1), (#'[i])” = t. By 1.5.6(4), 
((Ax).#’)” =t, contradicting (2). 

Case 3..4 = x = x. Then an arbitrary -instance is i = 1 for somei € M. 
By 1.5.6(1), G@ =i)” =t. 

Case 4..4=t=s > (4[t] ~ Z[s]). Once more, we take an arbitrary SM- 
instance, t/ = s’ > (.#’[t'] + .A’[s’]). Suppose that (t/ = s’)” = t. 
That is, ’? = s'7 = (let us say) i (in M). But then 


(Bt)? =(#ED?, — by 1.5.11 
=(8'[s'})?, by 15.11 


Hence (.7[t] + .A[s])”7 =t. 


For the induction step we have two cases: 


Modus ponens. Let .2 and .2 — .4 be T-theorems. Fix an 9t-instance 
B! > A! Since.2',.2' > 74 tan 4’, the argument here is entirely analo- 
gous to the case.4 € A (hence we omit it). 

4-introduction. Let .4 = (Ax).2 > @ andl + #& > ZF, where x is not 
free in %. By the LH. 


Em 2 & (3) 


Let (4x).2’ — &”’ be an St-instance such that (despite expectations) 
((ax).2')” = t but 


Gat (4) 
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Thus 


By =t (5) 


for some i € M. Since x is not free in %, .4'[i] > @’ is a false (by (4) and 
(5)) IN-instance of .7 — @, contradicting (3). 


We have used the condition of 4-introduction above, by saying “Since x is not 
free in ¥,.Z'[i] > &' is a[n]...M-instance of Z > F”. 

So the condition was useful. But is it essential? Yes, since, for example, if 
x#y,thnx=yox=yKRQAxx=younay. © 


As a corollary of soundness we have the consistency of pure theories: 
1.5.13 Corollary. Any first order pure theory is consistent. 


Proof. Let & be a pure theory over some language L. Since - —x = x, it 
follows that 4< —x = x, thus.7 4~ Wff. 


By /.4 and / .4 we mean the metatheoretical statements “ ‘tb .4’ is false” 
and “‘~ . 4’ is false” respectively. © 


1.5.14 Corollary. Any first order theory that has a model is consistent. 


Proof. Let © be a first theory over some language L, and Jt a model of &. 
Since Foy —x = x, it follows that 4<q —x = x, thus.7 4 Wf. 


First order definability in a structure. We are now in a position to make the 
process of “translation” to and from informal mathematics rigorous. 


1.5.15 Definition. Let L be a first order language, and 90 a structure for L. A 
set (synonymously, relation) S C M"” is (first order) definable in IN over L 
iff for some formula .“(y1,..., Yn) (see p. 19 for a reminder on round-bracket 
notation) and for alli;, 7 = 1,...,n,in M, 


Wives ces i. See Gist 


We often just say “definable in 2’. 


A function f:M" —> M is definable in 30 over L iff the relation y = 
F(%1,---, Xn) is so definable. 
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N.B. Some authors say “(first order) expressible” (Smullyan (1992)) rather 
than “(first order) definable” in a structure. 


In the context of 9, the above definition gives precision to statements such 
as “we code (or translate) an informal statement into the formal language” or 
“the (formal language) formula. informally ‘says’ ...”, since any (informal) 
“statement” (or relation) that depends on the informal variables x;,..., x, has 
the form “(x;,...,%,) € S” for some (informal) set S. It also captures the 
essence of the statement “The (informal) statement (x;,...,X,) € S can be 
written (or can be made) in the formal language.” 


What “makes” the statement, in the formal language, is the formula.”. 


1.5.16 Example. The informal statement “z is a prime” has a formal translation 
S0<zA(Vx)(Vy)\(z =x xX y > x=zVx=S0) 


over the language of elementary number theory, where nonlogical symbols 
are 0,S,-+, x, < and the definition (translation) is effected in the standard 
structure Jt = (N;0;5,+, x; <), where “S” satisfies, for alln ¢ N, S(n) = 
n-+ | and interprets “S” (see 1.5.2, p. 54, for the “unpacked” notation we have 
just used to denote the structure 3t). We have used the variable name “z” both 
formally and informally, but we have used a typographical trick: The formal 
variable was in boldface type while the informal one was in lightface. 


It must be said that translation is not just an art or skill. There are theoretical 
limitations to translation. The trivial limitation is that if MW is an infinite set and, 
say, L has a finite set of nonlogical symbols (as is the case in arithmetic and 
set theory), then we cannot define all S C M, simply because we do not have 
enough first order formulas to do so. 

There are non-trivial limitations too. Some sets are not first order definable 
because their definitions are “far too complex” (the reader who wants more 
on this comment may wish to look up the section on definability and incom- 
pletableness in volume 1 of these lectures (Mathematical Logic)). © 


This is a good place to introduce a common notational argot that allows us to 
write mixed-mode formulas that have a formal part (over some language L) 
but may contain informal constants (names, to be sure, but names that have not 
formally been imported into L) from some structure Jt appropriate for L. 


1.5.17 Informal Definition. Let L be a first order language, and Wt = (M,.7) 
astructure for L. Let. 4 be a formula with at most x;,..., X, free, andi,,..., in 
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be members of M. The notation .4[[i,...,i, ]] is an abbreviation of 


(4lh,....ml)”. 


This argot allows one to substitute informal objects into variables outright, 
by-passing the procedure of importing formal names for such objects into the 
language. It is noteworthy that mixed mode formulas can be defined directly by 
induction on formulas — that is, without forming L(t) first — as follows: 


Let L and 9Jt be as above. Let x;,..., Xx, contain all the free variables that 
appear in a term ¢ or formula .Z over L (not over L(9M)). Let i1,..., in be 
arbitrary in M. 

For terms we define 

tllit,..-in] 

ij ift=xjUd<j<n) 


= iq? if t=a 


#7 (GLissceevin eh Lineal) tS fn 


For formulas we let 


41i,..-in] 
tlli,...i,J=s[,..-in] if 4@=t=s 
Pi lis cleo Thc) i464 = Pht 
={3( 481i. s.% 1) if 4=-f 
(@0i,...i0V Fli,....i]) if 4= BV 
(da € M).# lLa,is,...,i,] if 4 = (Az).Blz, Xn] 


where “(da € M)...” is short for “Ga)(a € MA ...)’. The right hand side 
of = has no free (informal) variables, thus it evaluates to t or f. © 


We now turn to the “hard half” of Gédel’s completeness theorem, which 
states that our syntactic proof apparatus can faithfully mimic proofs by logical 
implication. That is, the syntactic apparatus is “complete”. 


1.5.18 Definition. A theory over L (designated by its nonlogical axioms) I" is 
semantically complete iff f - .4 implies [ + .4 for any formula. 7. 


The term “semantically complete” is not used much. There is a competing 
syntactic notion of completeness, that of simple completeness, also called just 
completeness. The latter is the notion one has normally in mind when saying 
“a complete theory”, or, in the opposite case, incomplete. © 
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The proof of the semantic completeness of every first order theory hinges on 
the consistency theorem, which we state without proof below.' The complete- 
ness theorem will then be derived as a corollary. 


1.5.19 Metatheorem (Consistency Theorem). /f a (first order) theory & is 
consistent, then it has a model. 


Metamathematically speaking, a set S is countable if it is finite or it can be put 
in 1-1 correspondence with N. The latter means that there is a total function 
f :N-— S that is onto — that is, (Vx € S$)\(dn EN) f(n) = x is true — and 1-1. 
“1-1” means that (Wn €N)(Vm EN)(f(n) = f(m) > n = m) is true. 

A set that is not countable is uncountable. Cantor has proved that the set of 
reals, IR, is uncountable. 


By definition, a language L is countable or uncountable iff the set of its 
nonlogical symbols is. 
By definition, a model is countable or uncountable iff its domain is. © 


The technique of proof of 1.5.19 yields the following important corollaries. 


1.5.20 Corollary. A consistent theory over a countable language has a count- 
able model. 


1.5.21 Corollary (Léwenheim-Skolem Theorem). /fa set of formulas T over 
a countable language has a model, then it has a countable model. 


1.5.22 Corollary (Gédel’s Completeness Theorem — Hard Half). Jn any 
countable first order language L,Y ~.#@ implies T + .4. 


Proof. Let .# denote the universal closure of .4. By Exercise 1.21, T° & .#. 
Thus, lr + —=.7 has no models (why?). Therefore it is inconsistent. Thus, [ + .# 
(by 1.4.21), and hence (specialization), TF .4. 


A way to rephrase completeness is that if [ - .4, then also A E .4, where 
A CT is finite. This follows by soundness, since lr — .4 entails T + .4 and 
hence At .4, where A consists of just those formulas of I used in the proof 


of 4. © 


+ For a proof see volume 1 of these lectures. 
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1.5.23 Corollary (Compactness Theorem). Jn any countable first order lan- 
guage L, a set of formulas T is satisfiable iff it is finitely satisfiable. 


Proof. Only-if part. This is trivial, for a model of I" is a model of any finite 
subset. 

If part. Suppose that I is unsatisfiable (it has no models). Then it is in- 
consistent by the consistency theorem. In particular, TF —x = x. Since the 
pure theory over L is consistent, a I'-proof of =x = x involves a nonempty 


finite sequence of nonlogical axioms (formulas of I), 4,,...,.4,,. That is, 
6,,...,.6, / ax =x, hence {.4,,...,.4,,} has no model (by soundness). 


This contradicts the hypothesis. 


© Now, if the language L is uncountable, we say that it has cardinality t if Y 
(or equivalently, the set of nonlogical symbols) does. Cardinality is studied 
within ZFC in Chapter VII. However, to extend the consistency theorem and 
its corollaries to uncountable LZ one only needs to have an understanding of 
the informal Cantorian concept and of its basic properties (e.g., the “real” 
counterpart of VII.5.17) along with a basic (informal) understanding of ordinals. 
The following is true (for a proof outline see volume | of these lectures). 


1.5.24 Metatheorem (Consistency Theorem). [fa (first order) theory Z over a 
language L of cardinality € is consistent, then it has a model of cardinality < . 


1.5.25 Corollary (Completeness Theorem). Jn any first order language L, 
Tl E.4 implies + .4. 


1.5.26 Corollary (Gédel-Mal’cevy Compactness Theorem). Jn any first order 
language L, a set of formulas T is satisfiable iff it is finitely satisfiable. 


The Léwenheim-Skolem theorem takes the following form: 
1.5.27 Corollary (Upward Léwenheim-Skolem Theorem). /f a set of formu- 
las T over a language L of cardinality € has an infinite model, then it has a 
model of any cardinality n such that t < n. 

At one extreme, ZFC set theory’s intended model is so huge that it is not 
even a set (its domain, that is, is not). At the other extreme, set theory has only 


two primary nonlogical symbols; hence, if we believe that it is consistent,’ it has 


+ We will have an opportunity to explain this hedging later on. 
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a countable model. Countable models play an important role in the metatheory 
of ZFC (as we see, e.g., in Chapter VIII. 
The (very condensed) material in this @ passage is not used anywhere in 


this volume. Oe 


1.6. Defined Symbols 


We have already mentioned that the language lives, and it is being constantly 
enriched by new nonlogical symbols through definitions. The reason we do this 
is to abbreviate undecipherably long formal texts, thus making them humanly 
understandable. 

There are three possible kinds of formal abbreviations, namely, abbreviations 
of formulas, abbreviations of variable terms (i.e., objects that depend on free 
variables), and abbreviations of constant terms (i.e., objects that do not depend 
on free variables). Correspondingly, we introduce a new nonlogical symbol for 
a predicate, a function, or a constant in order to accomplish such abbreviations. 

Here are three simple examples, representative of each case. 

We introduce a new predicate (symbol), “C”, in set theory by a definition‘ 


ACBes(VWx)xEeA>xeB) 


An introduction of a function symbol by definition is familiar from elemen- 
tary mathematics. There is a theorem that says 


“for every non-negative real number x there is a unique 
non-negative real number y such that x = y- y” 


() 


This justifies the introduction of a 1-ary function symbol f that, for each such x, 
produces the corresponding y. Instead of using the generic “ f(x)”, we normally 
adopt one of the notations “,/x” or “x!/?”. Thus, we enrich the language (of, 
say, algebra) by the function symbol ./ and add as an axiom the definition of 
its behaviour. This would be 


x = JSxJx 
or 
y=eVxox=y-y 


where the restriction x > 0 is implied by the context. 


T In practice we state the above definition in argot, probably as “A C B means that, for all x, we 
havex € A> x € B”. 
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The “enabling formula” (1) — stated in argot above — is crucial in order 


that we be allowed to introduce ,/- and its defining axiom. That is, before we 
introduce an abbreviation of a (variable or constant) term — 1.e., an object — we 
must have a proof in our theory of an existential formula, i.e., one of the type 
(Aly). 4, that asserts that (if applicable, for each “value” of the free variables) 
a unique such object exists. 


The symbol “(4!y)” is read “there is a unique y’’. It is a logical abbreviation 
(defined logical symbol, just like V) given (in least parenthesized form) by 


(Ax)(4 A 7G@2z)4 A ax = 2)) 


Finally, an example of introducing a new constant symbol, from set theory, 


is the introduction of the symbol % into the language, as the name of the unique 
object’ y that satisfies ~U(y) A (Vx)x ¢ y, read “y is a set! and it has no 
members”. Thus, J is defined by 


=U (D) A (Wx)x ED 


or, equivalently, by 


y=b<o Uy) A (Wx)x € y 


The general situation is this: We start with a theory I’, spoken in some 


basic: formal language L. As the development of I" proceeds, gradually and 
continuously we extend L into languages L,,, for n > 0 (we have set Lo = L). 
Thus the symbol L,,,; stands for some arbitrary extension of L, effected at stage 
n+ 1. The theory itself is being extended by stages, as a sequence l,, n > 0. 


A stage is marked by the event of introducing a single new symbol into the 


language via a definition of a new predicate, function, or constant symbol. At 
that same stage we also add to I’, the defining nonlogical axiom of the new 
symbol in question, thus extending the theory I’, into ’,41. We set [p9 = [. 


Specifically, if 1 ((x,,) is some formula we then can introduce a new predi- 


cate symbol “P’* that stands for @. 


t 


es 


= wm 


Uniqueness follows from extensionality, while existence follows from separation. These facts — 
and the italicized terminology — are found in Chapter III. 

U is 1-ary (unary) predicate. It is one of the two primitive nonlogical symbols of formal set 
theory. With the help of this predicate we can test an object for set or atom status. ““ U(y)” asserts 
that y is an atom; thus “—U(y)” asserts that y is a set — since we accept that sets or atoms are the 
only types of objects that the formal system axiomatically characterizes. 

“Basic” means here the language given originally, before any new symbols were added. 

Recall that (see Remark I.1.11, p. 19) the notation ((X,) asserts that Xp, i.e., x1,..., X», is the 
complete list of the free variables of @. 

Recall that predicate letters are denoted by non-calligraphic capital letters P, Q, R with or without 
subscripts or primes. 


© 
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In the present description, @ is a syntactic (meta-)variable, while P is a new 
formal predicate symbol. 


This entails adding P to L, (i.e., to its alphabet 7”,) as a new n-ary predicate 
symbol, and adding 


PX) <> O(%n) (i) 


to I’, as the defining axiom for P. “C” is such a defined (2-ary) predicate in set 
theory. 

Similarly, a new n-ary function symbol f is added into L; (to form L;z+1) by 
a definition of its behaviour. That is, we add f to L; and also add the following 
formula (ii) to I, as a new nonlogical axiom 


y= fy... Yn > CW, V15-- +s Yn) (ii) 


provided we have a proof in I’, of the formula 


(IWC, Y1.--+5 Yn): (iii) 


Depending on the theory and on the number of free variables (n > 0), “f” may 
take theory-specific names such as J, w, ./, etc. (in this illustration, for the 
sake of economy of effort, we have thought of defined constants, e.g., J and w, 
as O-ary functions). 


In effecting these definitions, we want to be assured of two things: 


(1) Whatever we can say in the richer language L, (for any k > 0) we can also 
state in the original (basic) language L = Lo (although awkwardly, which 
justifies our doing all this). “Can be stated” means that we can translate any 
formula. over L; (hopefully in a “natural” way) into a formula .¥* over 
L so that the extended theory I’, can prove that .¥ and.¥ * are equivalent. 

(2) We also want to be assured that the new symbols offer no more than conve- 
nience, in the sense that any formula.¥ over the basic language L deducible 
from I’; (k > 0), one way or another (perhaps with the help of defined sym- 
bols) is also deducible from I’? 


These assurances will become available shortly, as Metatheorems I.6.1 and 1.6.3. 
Here are the “natural” translation rules that take us from a language stage Li41 


TP, spoken over L, can have no opinion, of course, since it cannot see the new symbols, nor does 
it have their definitions among its “knowledge”. 

! Trivially, any F over L that I can prove, any Ty, (k > 0) can prove as well, since the latter 
understands the language (L) and contains all the axioms of I’. Thus I, extends the theory I. 
That it cannot have more theorems over L than T makes this extension conservative. 


© 
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back to the previous, L; (so that, iterating the process, we get back to L): 


Rule (1). Suppose that Y is a formula over Lx+1, and that the predicate 
P (whose definition took us from L; to Lx+41, and hence is a symbol of 
Lx41 but not of Lx) occurs in.Y zero or more times. Assume that P has 
been defined by the axiom (i) above (included in T',41), where @ is a 
formula over L;. We eliminate P from. by replacing all its occurrences 
by @. That is, whenever Pt, is a subformula of .F, all its occurrences are 
replaced by ((t,). We can always arrange by I.4.13 that the simultaneous 
substitution 7[X, <— ty] is defined. This results to a formula.¥ * over Ly. 

Rule (2). If f is a defined n-ary function symbol as in (ii) above, introduced 
into Ly41, and if it occurs in.Y as .Y[ft,...t,],' then this formula is 
logically equivalent to? 


(y)\(y = fti...tr A Fly) (iv) 


provided that y is not free in.Y[ft,...t,]. Using the definition of f 
given by (ii), and I.4.13 to ensure that ((y, t,) is defined, we eliminate 
this occurrence of f, writing (iv) as 


Gy(C, t,---5m) A Fly) (v) 


which says the same thing as (iv) in any theory that thinks that (i7) is 
true (this observation is made precise in the proof of Metatheorem I.6.1). 
Of course, f may occur many times in .%, even “within itself”, as in 
ffZ-+-ZnY2+++Yn;5 or even in more complicated configurations. Indeed, 
it may occur within the scope of a quantifier. So the rule becomes: Apply the 
transformation taking every atomic subformula .4[ft,...t,] of F into 
the form (v) by stages, eliminating at each stage the leftmost-innermost' 
occurrence of f (in the atomic formula we are transforming at this stage), 
until all occurrences of f are eliminated. We now have a formula.¥ * over 
Lx. 


1.6.1 Metatheorem (Elimination of Defined Symbols I). Let I be any theory 
over some formal language L. 


(a) Let the formula @ be over L, and P be anew predicate symbol that extends 
L into L' andT into VY’ via the axiom PX, <> @(Xn). Then, for any formula 


¥ This notation allows for the possibility that ft)... t? does not occur at allin.F (see the convention 
on brackets, p. 19). 

= See (C) in the proof of Metatheorem I.6.1 below. 

8 Or Sf C1,-++5 Zn), Y2-++ Yn)), using brackets and commas to facilitate reading. 

1 A term ft,...ty is innermost iff none of the ¢; contains “f”’. 
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F over L', the P-elimination as in Rule (1) above yields a .F* over L such 
that 
Vb F o F* 


(b) Let F [x] be over L, and let t stand for ft,...ty, where f is introduced 
by (ii) above as an axiom that extends T into Y’. Assume that no t; contains 
the letter f and that y is not free in. [t]. Thent 


VF F(t] o Ay(@0, th) A Fly) 
eee “LE” is “Ly41” (for some k) and “L” is “L,”. 


Proof. First observe that this metatheorem indeed gives the assurance that, after 
applying the transformations (1) and (2) to obtain .7* from.¥, I’ thinks that 
the two are equivalent. 


(a): This follows immediately from the Leibniz rule (1.4.25). 
(b): Start with 


LF ¥[t]>t=taF{[t] (by F t = ¢ and Etaut-implication) (A) 
Now, by Ax2, substitutability, and non-freedom of y in.¥ [ft], 
Fr=tAF[t] > Ay =ta Fly) 
Hence 
+ F[t]> Ay\y =ta Fly) (B) 


by (A) and — qaut-implication.! 


Conversely, 
Fy=t—> (¥{[y] < F[t)) (Ax4; substitutability was used here) 
Hence (by taut) 
Fy=tAF¥[y] > F[t] 
Therefore, by 4-introduction (allowed, by our assumption on y), 
F Gyg =ta Fly) > Fie] 
1 As we already have remarked, in view of 1.4.13, it is unnecessary pedantry to make assumptions 


on substitutability explicit. 
t We will often write just “by FE Taut” Meaning to say “by -qaut-implication”. 
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which, along with (B), establishes 
F F(t] <> Ayy =taFly)) (C) 
Finally, by (ii) (which introduces I’ to the left of +), (C), and the Leibniz rule, 


VF Fit] > Ay\(Co, tr) A Fly) (D) 


The import of Metatheorem I.6.1 is that if we transform a formula.¥ — written 
over some arbitrary extension by definitions, L;,1, of the basic language L — 
into a formula.¥* over L, then y+ (the theory over L;,, that has the benefit 
of all the added axioms) thinks that.% <> .%*. The reason for this is that we 
can imagine that we eliminate one new symbol] at a time, repeatedly applying 
the metatheorem above — part (b) to atomic subformulas — forming a sequence 
of increasingly more basic formulas .¥ , a5 F yy F y_s+++5F o, where F 9 is 
the same string as.¥* and.7,,, is the same string as .¥. 

Now, Ti41 / .F;,; - 4%; fori = k,...,0, where, if a defined function 
letter was eliminated at step i + 1 — i, we invoke (D) above and Leibniz 
rule. Hence, since Fy C Ty C--- C My4y, we have yy) FF.) <> F; for 
i=k,...,0,and therefore [41 .7;,, > Fo. © 


1.6.2 Remark (One Point Rule). The absolutely provable formula in (C) above 
is sometimes called the one point rule (Gries and Schneider (1994), Tourlakis 
(2000a, 2000b, 2001)). Its “dual” 


Fltl> Wy) =t—> Fy) 


is also given the same nickname and is easily (absolutely) provable using (C) 
by eliminating 3. 


1.6.3 Metatheorem (Elimination of Defined Symbols I). Let be a theory 
over a language L. 


(a) If L' denotes the extension of L by the new predicate symbol P, and Y' 
denotes the extension of V by the addition of the axiom PX, < C\(Xn), 
where @ is a formula over L, then’ + .¥ for any formula .Y over L such 
that!’ + .F. 

(b) Assume that 


TE Aly).4Q, x1,---, Xn) (*) 
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pursuant to which we defined the new function symbol f by the axiom 
y= fxy...%n © Ay, M,.--,Xn) («*) 
and thus extended L to L' andY to V'. Then’ + ¥ for any formula .F 


over L such that T' + .F. 


Proof. This metatheorem assures that extensions of theories by definitions are 
conservative in that they produce convenience but no additional power (the 
same old theorems over the original language are the only ones provable). 


(a): By the completeness theorem, we show instead that 
TEF (1) 
So let IN = (M, .7) be an arbitrary model of I, i-e., let 
Fon I (2) 


We now expand the structure IM into IN’ = (M, .7’) — without adding any new 
individuals to its domain M — by adding an interpretation, P”’, for the new 
symbol P. We define for every a),..., a, in M 


P?'(ay,...,@,) =t iff Eon OGi,...,G,) [ie iff Eon O@1,-.--,Ga)] 


Clearly then, Sv’ is a model of the new axiom, since, for all 9’-instances of the 
axiom — such as P(d,...,@n) << C(@,..., G,) — we have 


(PGi,...,0n) e OGi,...,G,))” =t 


It follows that Egy T’’, since we have Eon I’, the latter by (2), due to having 
made no changes to 3Jt that affect the symbols of L. Thus, I’ | .¥ yields 
Eon F; hence, since .¥ is over L, Eon .¥. Along with (2), this proves (1). 


(b): As in (a), assume (2) in an attempt to prove (1). By (*) 
Eon (Aly). A0, X15 +665 Xn) 


Thus, there is a concrete (i.e., in the metatheory) function f of n arguments that 
takes its inputs from M and gives its outputs to M, the input-output relation 
being given by (3) below (Dp in, a out). To be specific, the semantics of “A!” 
implies that for all b},..., b, in M there is a unique a € M such that 


(2G, bi,...,B,))” =t (3) 
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We now expand the structure IM into NN = (M,.7’),/ so that all we add to it 
is an interpretation for the new function symbol f. We let f-”’ = f. From (2) 
it follows that 


Eo I (2’) 
since we made no changes to Jt other than adding an interpretation of f, and 
since no formula in I contains f. By (3), if a, bj,...,b, are any members of 


M, then we have 
Lon a= fb,...b, iff a= f(b),..., Pn) 
iff Eon 4G, b1,...,bn) by the definition of f 
iff Eon AG, b1,...,bn) 


— the last “iff” because .% (over L) means the same thing in SJt and MN’. 
Thus, 


Eon y= fxy...Xn > Ay, X1,---, Xn) (4) 


Now («), (2’) and (4) yield Eo I’, which implies Egy .Y (from I’ FF ). 
Finally, since.Y contains no f, Fon .%. This last result and (2) give (1). 


1.6.4 Remark. 

(a) We note that translation rule (1) and (2) — the latter applied to atomic sub- 
formulas — preserve the syntactic structure of quantifier prefixes. For example, 
suppose that we have introduced f in set theory by 


y= fxy...Xn > OY, x1... 5 Xn) (5) 


Now, an application of the collection axiom of set theory has a hypothesis of 
the form 


“(Wx € Z)\(Aw)(...4[ ft, ...t]...)” (6) 


where, say, .4 is atomic and the displayed f is innermost. Eliminating this f 
we have the translation 


“(Wx € Z)Aw)(...y)-4Ly] A GQ, ty ++ sth). )” (7) 


which still has the V3-prefix and still looks exactly like a collection axiom 
hypothesis. 

(b) Rather than worrying about the ontology of the function symbol formally 
introduced by (5) above — i.e., the question of the exact nature of the symbol 


1 This part is independent of part (a); hence this is a different 7’ in general. 
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that we named “ f” — in practice we shrug this off and resort to metalinguistic 
devices to name the function symbol, or the term that naturally arises from 
it. For example, one can use the notation “fy” for the function — where the 
subscript “@” is the exact string over the language that “/@”’ denotes — or, for 
the corresponding term, the notation of Whitehead and Russell (1912), 


(62Z)O(Z, X15 ++ +5 Xn) (8) 


The “z” in (8) above is a bound variable.' This new type of term is read “the 
unique z such that...”. 


This “2” is not one of our primitive symbols.! It is just meant to lead to the 
friendly shorthand (8) above that avoids the ontology issue. 


Thus, once one proves 
(A!Z)O(Z, X1,-+-5Xn) (9) 
one can then introduce (8) by the axiom 
Y = (02)O(Z, X1,---, Xn) <> OU X1,-- + Xn) (3) 


which, of course, is an alias for axiom (5), using more suggestive notation for 
the term fx1,..., Xp. 
By (9), axioms (5) or (5’) can be replaced by 


PMs a May hse ein) 
and 
OU, x1, Lae) Xn), X1, aia 2 Xn) (10) 


respectively. For example, from (5’) we get (10) by substitution. Now, Ax4 
(with some help from E taut) yields 


OO, m1, , «> Xn), X1,. i? 26in) =F 
y = (Wz)O(Z, X1, +--+, Xn) > OY, X1,+++,Xn) 


Hence, assuming (10), 


y = (Z)O(Z, X1,---,Xn) > Gy, X1,---, Xn) (11) 


} That it must be distinct from the x; is obvious. 

= It is however possible to enlarge our alphabet to include “:”, and then add definitions of the 
syntax of “t-terms” and axioms for the behaviour of “t-terms”. At the end of all this one gets a 
conservative extension of the original theory, i.e., any :-free formula provable in the new theory 
can be also proved in the old (Hilbert and Bernays (1968)). 
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Finally, deploying (9), we get 


O(W2)O, m1, 2 <3 Xn) X),. : See) 3 
MY, X1,+++5Xn) bs y = (WZ)O(Z, X1,--+5Xn) 


Hence 


OY, X15-++5Xn) > y = (tz)O(Z, X1, +++ Xn) 


by (10). This, along with (11), yields (5’). © 


© The indefinite article. We often have the following situation: We have proved a 
statement like 


(ax). 4 [x] () 


and we want next to derive a statement .7. 

To this end, we start by picking a symbol c not in.# and say “let c be such that 
4 [c] is true”. That is, we add. 4 [c] as a nonlogical axiom, treating c as anew 
constant. From all these assumptions we then manage to prove .7, hopefully 
treating all the free variables of .4 [c] as constants during the argument. We then 
conclude that .@ has been derived without the help of 4[c] or c (see 1.4.27). 

Two things are noteworthy in this technique: One, c does not occur in the 
conclusion, and, two, c is not uniquely determined by (1). So we have a c, 
rather than the c, that makes . 4 [c] true. 


Now the suggestion that the free variables of the latter be frozen during the 
derivation of .7 is unnecessarily restrictive, and we have a more general result: 


Suppose that 

TF (ax). 4, yi,---5 Yn) (2) 
Add a new function symbol f to the language L of I (thus obtaining L’) via 
the axiom 

AA FY «++ Yas Vis + +9 Yn) (3) 
This says, intuitively, “for any y1,...,¥n, let x = fy...y, make .4(x, 
y1,---, Yn) true”. Again, this x is not uniquely determined by (2). 


Finally, suppose that we have a proof 


P+ AF - + Yn Viseees Mn B (4) 


+ Cf IL41. 


76 I. A Bit of Logic: A User’s Toolbox 


such that f, the new function symbol, occurs nowhere in .%, i.e., the latter 
formula is over L. We can conclude then that 


TE (5) 


that is, the extension. +. 4( fy, ... Yn, Y1,-++; Yn) Of T is conservative. 


A proof of the legitimacy of this technique, based on the completeness 
theorem, is easy. Let 


Fo P (6) 


and show 
Fon 2 (7) 


Expand the model It = (M, .7) to MW” = (M,.7’) so that .7’ interprets the new 
symbol f. The interpretation is chosen as follows: 

(2) guarantees that, for all choices of i;,..., i, in M, the set S(i,,...,in) = 
{aEM: Em .4G, i1,...,in)} is not empty. By the axiom of choice (of in- 
formal set theory), we can pick? an a(ij,..., in) in each S(i,,...,i,). Thus, 
we define a function a : M” —> M by letting, for each ij,...,i, in M, 
Fis csicda Saiacvelne 


The next step is to set 
jo a7 
Therefore, for all i,,..., i, in M, 
(fin..tn)” = F Gi... in) = alii, ...5 in) 
It is now clear that Eon -4(f¥1 --- Yn, Y1s+++> Yn), for, by 1.5.11, 


(AFI int in” = to (4GG, sin) fin)? =H 


and the right hand side of the above is true by the choice of a(ij,..., in). 

Thus, Foy  +-4(fy1.-- Yn, Yi,--+5 Yn); hence Eon 7, by (4). 

Since .# contains no f, we also have Foy .7; thus we have established (7) 
from (6). We now have (5). 

One can give a number of names to a function like f: A Skolem function, 
an ¢-term (Hilbert and Bernays (1968)), or a t-term (Bourbaki (1966b)). In 
the first case one may ornament the symbol f, e.g., f3.z, to show where it is 
coming from, although such mnemonic naming is not, of course, mandatory. 


t The“(i,,..., in)” part indicates that “a” depends on i,..., 1a 
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The last two terminologies actually apply to the term fy, ... y,, rather than to 
the function symbol f. 
Hilbert would have written 


(ex) G(x, y1.-. 5 Yn) (8) 
and Bourbaki 
(tx).G(X,Y1--+ + Yn) (9) 


each denoting fy, ...y,. The “x” in each of (8) and (9) is a bound variable 
(different from each y,). Oe 


1.7. Formalizing Interpretations 


In Section I.5 we discussed Tarski semantics. As we pointed out there (footnote, 
p. 54), this semantics, while rigorous, is not formal. It is easy to formalize Tarski 
semantics, and we do so in this section not out of a compulsion to formalize, 
but because formal interpretations are at the heart of many relative consistency 
results, some of which we want to discuss in this volume. 

As always, we start with a formal language, L. We want to interpret its 
terms and formulas inside some appropriate structure 0 = (M, .7). This time, 
instead of relying on the metatheory to provide us with a universe of discourse, 
M, we will have another formal language’ L; and a theory {; over L; to supply 
the structure. 

Now, such a universe is, intuitively, a collection of individuals. Any formula 
_W6é(x) over L; can formally denote a collection of objects. For example, we 
may think of .4(x) as defining “the collection of all x such that ./(x) holds” 
(whatever we may intuitively understand by “holds’”). 

We have carefully avoided saying “set of all x such that .#(x) holds”, 
since, if (for example) ZL; is an extension of the language of set theory, then 
“the collection of all x such that x ¢ x holds” is not a set.! Intuitively, such 
collections are of “enormous size” (this being the reason — again, intuitively — 
that prevents them from being sets). 


The fact that a formula. 4 (x) might formally denote a collection that is nota set 
is perfectly consistent with our purposes. After all, the intended interpretation 
of set theory has such a non-set collection as its universe. © 


1 The subscript “i” is a weak attempt on my part to keep reminding us throughout this section that 
L; and {; are to implement an interpretation of L. 
+ See 1.2.1. 
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The requirement that a universe be nonempty — or that it be true in the 
metatheory that M + 9% — translates to the formal requirement that {; can 
syntactically! certify the nonemptiness: 


Fe, (Ax). 4 (x) (1) 


The primary interpretation mapping, .7, is similar to the one defined in 1.5.1. 
We summarize what we have agreed to do so far, “translating” Definition L.5.1 
to the one below. 


1.7.1 Definition. Given a language L = (7, Term, Wff). 
A formal interpretation of L is a 4-tuple 3 = (L;, {;,-G(x),.7), where 
L; = (7 ;, Term;, Wff;) is a first order language (possibly, the same as L), [; 
is a theory over L;,./(x) is a formula over L;, and .7 is a total mapping from 
the set of nonlogical symbols of L into the set of nonlogical symbols of L;. 
Moreover, it is required that the following hold: 


(i) (1) above holds. 
(ii) For each constant a of 7, a” is a constant of FY, such that kz, .4 (a’). 
(iii) For each function f of 7, of arity n, f7 is function of 7,, of arity n, 
such that 


be, 06(x1) A Mb(X2) A+++ \ MO (Xn) > MOF 7 x1X2 .. Xn) 


(iv) For each predicate P of 7”, P’” is a predicate of F ,, of arity n. 


The conditions in I.7.1(ii) and I.7.1(iii) simply say that the universe {x : 7%} is 
closed under constants (i.e., contains the interpreting constants, a”) and under 
the interpreting functions, f”. 

Some authors will not assume that L; already has enough nonlogical symbols 
to effect the mapping .7 as plainly as in the definition above. They will instead 
say that, for example, to any n-ary f of L, .7 will assign a formula A(y, Xn) 
of L; such that 


be, M(x) A+++ A M6(Xn) > (ly)(ZO) A “Aly, An)) 


In view of our work in the previous section, this would be an unreasonably 
roundabout way for us to tell the story. 
Similarly, the results of Section 1.6 allow us, without loss of generality, to 


always assume that the formula. in an interpretation J = (... ,.4,...) is 
atomic, Px, where P is some unary predicate. © 


+ We thus substitute the syntactic, or formal, requirement of provability for the semantic, or infor- 
mal, concept of truth. 
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We next formalize the extension of .7 to all terms and formulas (cf. 1.5.5 
and I.5.6). 


1.7.2 Definition. For every term t over L, we define its relativization to.Z, in 
symbols t'”, by induction on f: 


a’? ift=a 
t®=%z ift =z _ (a variable) 
7 Ue Me. 
ou ade ifP= fick 
where ¢,..., ¢, are terms over L, and f is an n-ary function of L. 
@A trivial induction (on terms f over L) proves that t-” is a term over L;. © 


1.7.3 Definition. For every .4 over L, we define its relativization to .4, in 
symbols. 4°”, by induction on. 4: 


tZ=s% if. Z4=t=s 
fo . M : 
jase ae if. 4 = Pt... th 
466 =\\(B*) if 4=78 


(2%) v (F*) if 4= BV 
(ALA2A\B”) if.4= (2B 


where s,f,¢),..., ¢, are terms over L, and P is an n-ary predicate of L. 


The two definitions I.7.2 and 1.7.3 are entirely analogous with the definition 
of mixed mode formulas (1.5.17). The analogy stands out if we imagine that 
“ 4” is some kind of novel notation for “4 [[... ]]”. Particularly telling is 
the last case (pretend that we have let M = {x :.4(x)}, where M may or may 
not be a set). 
We have restricted the definition to the primary logical symbols. Thus, e.g., 
just as (Wx).Z abbreviates —(Ax)—.4, we have that (Vx). A ie ” abbreviates 
a((ax)—. 4 ey”, ie., a(Ax)( A(x)A7. 4), or, in terms of “V”, (Vx)(.4(x) > 
Ae). 
A trivial induction (on formulas .4 over L) proves that 4 js a formula 


over L,;. © 


We have defined in Section I.5 the symbol Fon .4(x1,..., X,) to mean 


For all aj,...,@, in M,.4[[q),..., ap] is true (1) 
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Correspondingly, we define the formalization of (1) in 3. Unfortunately, we will 
use the same symbol as above, K. However, the context will reveal whether 
it is the semantic or syntactic (formal) version that we are talking about. In 
the latter case we have a subscript, 3, that is a formal interpretation (not a 
metamathematical structure) name. 


1.7.4 Definition. Let 3 = (L;, {;,.4 (x), .7) be a formal interpretation for a 


language L. For any formula. 4(%1,...,x,) over L, the symbol 
KE 5 4(xX1,..-5;Xn) (2) 


is short for 
be, G6(x1) \ (x2) ++ A MO (Xp) > BO (1, Xn) (3) 


The part “4 (x1) A. G(x2) A+++ A. G(x) >” in (3) is empty if 4 is a 
sentence. 


We will (very reluctantly) pronounce (2) above “.4(x1,..., X,) is true in the 
interpretation 3”. Even though we have said “true”, the context will alert us to 
the argot use of the term, and that we really are talking about provability — (3) — 
here. 

The following lemma is the counterpart of Lemma I.5.11. 


1.7.5 Lemma. Given terms s and t and a formula .%, all over L. Then 
(s[x < t]y a sx < t”] and (4[x Ps tly%= A lx Ze tt}. 


qwe assume that the operation [x < 1] is possible, without loss of generality. © 


Proof. The details of the two inductions, on terms s and formulas . 7, are left 
to the reader (see the proof of 1.5.11). 

We only look at one “hard case” in each induction: 

Induction on terms s. Lets = ftity...t,. Then 


(s[x — ty” = (fale <4)... talx — ty 
= f7(tl[x< tly”... ([x < t]y7 7. 
rac V7... Gal D4 by L722 
= f7t [xr Pay xe t® H. 
F704). xe rZ] by LH 
GM M Me 
(f7t) ...t xr] 
s#ix<—t®] by 17.2 
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Induction on formulas 4. Let.4 = (Aw). and w # x. Then 


(4lx < t)% = (((aw).2) ix an n)” 
= (Gw).2 lee n)” (recall the priority of [...]) 
= Gw)(.“(w) A (Bix <1)” =) by 1.7.3 
= w)(.Z(w) ABH x eK o)) by LH. 
= dv) 4w)a 27)[x—t7] byw #x 
= ((aw).2) “Ix tr], by L723 


We will also need the following lemma. It says that all “interpreting objects” 
are in {x :.4}. 


1.7.6 Lemma. For any term t over L, 
Fe, Z(x1) A+++ A (Xn) > M(t? “1X, 1) (4) 


where all the free variables of t are among the Xp. 


Proof. We have three cases. 


(a) t = a, aconstant. Then the prefix “.H(x,) A--- A. G(xn) >” is empty 
in (4), and the result follows from I.7.1(41). 

(b) tf =z, a variable. Then (4) becomes Fs, .4(z) > -4(z). 

(c) t= ft, ...t,. Now (4) is 


Ty OTR 


bs, U(x) A AM) > OF? EEN) (5) 


To see why (5) holds, freeze the x, and add the axiom.7 = .4(x|)A---A 
M6(x,) to F;. By the L.H., 

bs 49 M(t [%,))  fori=l,...,n 
By tautological implication, substitution (1.4.12), and I.7.1 (iii), the above 
yields 


be+.@ 6 f7 6A]... 07 T)) 


The deduction theorem does the rest. 


We are ready to prove our key result in this connection, namely soundness. 
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1.7.7 Theorem. Let 3 = (L;, 2;,.4,.7) be a formal interpretation of a lan- 
guage L. Then for any.#% € A over L (cf. 1.3.13), 


3 JE(X1,- +65 Xp) 


Proof. We want 


te, G6(x1) A--» \ Mb (Xn) > BA (x1,. 025 Xn) (6) 


for all. 4 € A. We have several cases. 


Axl. 


Ax2. 


Ax3. 


Ax4. 


Let .4(X,) be a tautology. As the operation...” does not change the 
Boolean connectivity of a formula, so is #4 ©(X,). Thus, (6) follows by 
tautological implication. 
Let .4(x, y, Z) =.B4x, t(X, y), Z) > (Aw).A, w, Z). By 1.7.5, 
AYE, }, = BYE, tH, ), D> Gw)(.Z(w) A BAG, w, 2) 
By 1.7.6, 

bs, B(x) A AM) Av > (EO, 9) (7) 
Since 
M(t, INABA, tA, 9), D> aw)(. Zw) AB“, w, 2) 
is in A over L;, (7) and tautological implication yield 
Kg, M(x1) A+ A M(y) Nwts > 

BHR, tHE, 3), > Gw)(.Z(w) A BAG, w, 2) 
One more tautological implication gives what we want: 
be, G(x) A-->A\ MM) A+ NM) N+ > 6H] 

Let 4(x) = x = x. We want Fs, (x) > x = x, which holds by 
tautological implication and the fact that x = x is logical over L;. 


Here .4[X,] =t=s > (#[x <t] — .Z[x <— 5]), where x, includes 
all the participating free variables. Thus, using I.7.5, (6) translates into 


be, G(x) A---\ MX) >t” = 5% 
> (88x << t@]o 6A ix <— s*)) 


which holds by tautological implication from the instance of the Leibniz 
axiom over L;, tt“ =s” >(87[x<t7@]o B@[x<— s)). 


I have used above abbreviations such as “2°” > .4”” for the abbreviation 
“(B—> AY”, ete. 


We next direct our attention to some theory & over L. 


© 
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1.7.8 Definition. Let T be a theory over L and 3 = (L;, 0;,.4,.7) bea formal 
interpretation of L over the language L;. 

We say that J is a formal interpretation of the theory (or a formal model 
of the theory) & just in case, for every nonlogical axiom .4 of &, itis EH; .4 
(cf. 1.7.4). 


1.7.9 Theorem (Formal Soundness). /f 3 = (L;, {;,-4,.7) is a formal in- 
terpretation of the theory & over L, then, for any formula .4 over L, Fy .4 
implies 3.4. 


Proof. We do induction on {-theorems. For the basis, if .4 is logical, then we 
are done by I.7.7. If it is nonlogical, then we are done by definition (1.7.8). 


Assume then that kz .2 > .4@ and ¢ .%, and let x, include all the free 
variables of these two formulas. By the I.H., 


be, (x1) A---\ M(%) > BH > 4 
and 
be L(x) A+--\ My) > BH 
The above two and tautological implication yield 
be, C(x) A+++ A MX) > 6% 


Finally, let it be the case that. 4 = (Az).2 — @, where z is not free in 7%, 
and moreover +s .2 — %. Let z, X, — distinct variables — include all the free 
variables of 2 > @. 

By the LH., 


be, G2) A Lm) A+++ A Mn) > BH > FH 
Hence (by tautological implication) 
bs, 2) BY > M64) > --- > 4%) > FH 
By i-introduction, 
be, (Az) 4(z) A.B”) > M(x) > + > Mn) > EH 


Utilizing tautological implication again, and Definition I.7.3, we are done: 


be G(x) A+++ \ (Kn) > ((a.4)” _, gt 


It is a shame to call the next result just a “corollary”, for it is the result on 
which we will base the various relative consistency results in this volume (with 
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the sole exception of those in Chapter VIII, where we work in the metatheory, 
mostly). 

The corollary simply says that if the theory, {;, in which we interpret { is 
not “broken”, then Y is consistent. This is the formal counterpart of the easy 
half of Gédel’s completeness theorem: If a theory { has a (metamathematical) 
model,’ then it is consistent. 


1.7.10 Corollary. Let 3 = (L;, £;,-4,.7) be a formal model of the theory T 
over L. If &; is consistent, then so is &. 
Proof. We prove the contrapositive. Let { be inconsistent; thus 
Fe 7x =x 
By 1.7.9, Fs, 4 (x) > 7x = x; thus, by 1.4.23, 
Fe, (Ax). 4 (x) > (Ax)-x = x 


Since Fz, (Ax). 4 (x) by 1.7.1, modus ponens yields Fs, (4x)-x = x, which 
along with Fz, (Vx)x = x shows that {; is inconsistent. 


We conclude the section with a brief discussion of a formal version of 
structure isomorphisms. In the case of “real structures” It = (M,...) and 
NN = (N,...), we have shown in volume | that if ¢ : M —> N isa 1-1 cor- 
respondence that preserves the meaning of all basic symbols, then it preserves 
the meaning of everything, that is, if.4 is a formula and a,b,... are in M, 
then Foy .4 a, b,... iff Em 4 Ld), (0), ...]. 

We will use the formal version only once in this volume; thus we feel free 
to restrict it to our purposes. To begin with, we assume a language L whose 
only nonlogical symbols are a unary and a binary predicate, which we will 
denote by U and € respectively (we have set theory in mind, of course). The 
interpretations of L whose isomorphisms we want to define and discuss are 
J = (Li, 8,.4, 7) and J = (L;,3;,./, FZ). Note that {; and L; are the 
same in both interpretations. 

Let now ¢ be a unary function symbol in L;. It is a formal isomorphism of 
the two interpretations iff the following hold: 


(1) Fs, (x) > Ay) 4(y) Ax = 6(y)) (ontoness”) 
(2) kz, Hax)rA Hy) > (x =y & (x) = 46(0))  (“1-Iness”') 


¥ It is a well-established habit not to doubt the metatheory’s reliability, a habit that has had its 
critics, including Hilbert, whose metatheory sought “reliability” in simplicity. But we are not 
getting into that discussion again. 

= The — half of < we get for free by an application of the Leibniz axiom. 
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(3) Fs, Hx) > (U: H(x) <= U4 (o(x)) (“preservation of U’’) 
(4) bs, A(x) A. Z(y) > (x E7y  o(x) © b(y)) (“preservation of €”) 


If L contains a constant c, then we must also have bc” ) = c”. This is 
met in our only application later on by having c’” = c = c’’ and ¢(c) = c. 


1.7.11 Remark. In what follows we will present quite a number of formal 
proofs. It is advisable then at this point to offer a proof-writing tool that will, 
hopefully, shorten many of these proofs. 

Whenever the mathematician is aware (of proofs) of a chain of equivalences 
such as 


6, <= Ad, by <=> As, tb, <> Ag,..., 1 An 
he often writes instead 


4,3 Aro A300 AGO 4, O An 


i.e., abusing notation and treating “<>” conjunctionally rather than (the correct) 
associatively. This parallels the (ab)uses 


a<b<c for a<bandb<c 
and 
a=b=c for a=bandb=c 
Of course, such a chain also proves .4, <> A, by tautological equivalence. 
Moreover, .4, is provable iff .Z,, is (by tautological implication). 


More generally, the chain may involve a mix of “<>” and “—”. Again tau- 
tological equivalence yields a proof of .4, — .4,, this time. 

Dijkstra and Scholten (1990), Gries and Schneider (1994), and Tourlakis 
(2000a, 2000b, 2001) suggest a vertical layout of such chains and say that such 
a chain constitutes a calculational proof: 

6, 
© or-> (annotation/reason) 
4, 


© or-7> (annotation/reason) 


<or-7 (annotation/reason) 


n 


+ We write “U: 4» vather than SU: 7 as this will be the habitual notation in the context of set 
theory. 
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from which 
ty .4,> 4, 


follows, where .7 is the theory within which we reasoned above. 
Moreover, if F.7 .%4,, then alsot.7 .4, by modus ponens. 


We can now prove: 


1.7.12 Lemma. Let L be a language with just U and €, above, as its nonlogical 
symbols, and let ¢ be a formal isomorphism of its two interpretations J = 
(Li, 3,4, 7) and J = (Li, 3, 4%, F) in the sense of (1)-(4). Then, for 
every formula .4 (Xn) over L, 


be, (x1) A+ A Min) > (4% Gn)  B7 (P(X1),-, PKn))) 
Proof. Induction on formulas. For the atomic ones the statement is just (2)-(4) 


above. We skip the trivial V and — cases and look into. 4(x,) = (dy). A(y, Xn). 
First, 


in) = (Ay)(4(y) A.B, Fn) (5) 
and 
A! (b(x1),--- On) = Ay(VO) AB (y, O@1),-- Rn) ©) 


We now freeze the x, and work in J; + .4 (x1) A--- A.4 (xn). We calculate 
as follows: 


€y)(VO) A.B (y, $1), ++ Pn) 
& (by (1) and Leibniz rule; z a new variable) 

Gy) (G2 4@) A y = 6@) AB, $41), --- Bn) 
< (newness of z) 

Az)€y)(- 42) A y = 6) A.B? (y, 61), ---, $Gn))) 
on (one point rule (1.6.2) and Leibniz rule) 

(Az)(4@) A.B! (P@), $1), +++, 6On))) 
e (LH. and Leibniz rule) 

(Az)(4(2) A BH (z, Xn)) 
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The top line of our calculation is (6), while the bottom is (5) (within bound 
variable renaming); thus we are done by the deduction theorem. 


1.8. The Incompleteness Theorems 


This brief section is only meant to acquaint the reader with what Gédel’s in- 
completeness theorems are about. The second theorem in particular is one that 
we will invoke a number of times in this volume; therefore it is desirable to 
present here the statements of these two theorems and outline, at the intuitive 
level, what makes them tick. A full exposition and complete proofs for both 
theorems can be found in our companion volume Mathematical Logic. 

Now, Gédel’s completeness theorem asserts the adequacy of the syntactic 
proof apparatus for characterization of “truth”. On the other hand, his incom- 
pleteness theorems assert the inadequacy of this syntactic apparatus for captur- 
ing “truth”. The contradiction is only apparent. Completeness says that truth 
of a formula in all concrete worlds (all models) of a first order theory can be 
adequately captured — the formula is provable. Incompleteness addresses truth 
in one world. Often such a world is the one that matters: The intended or natural 
model of a theory that we want to study axiomatically. A formula of the theory 
that is true (in the Tarski semantics sense) in the intended model is, naturally, 
called really true. An example of such a special world is our familiar structure, 
x = (N; S,+, x; <;0). Peano arithmetic is the associated formal theory that 
attempts to characterize this structure. 

The first incompleteness theorem in its semantic version says that Peano 
arithmetic, or indeed any reasonably well-constructed extension, cannot do a 
very good job of proving all the formulas that are really true (in Nt). It misses 
infinitely many. Hence the term “incompleteness”, or, more emphatically, “in- 
completableness”’, the latter because we cannot make incompleteness go away 
by throwing axioms at it. 

Let us dispense with some terminology before we can actually state and 
discuss the theorems. 


1.8.1 Definition. The language for Peano arithmetic we denote by Lx. It has 
the nonlogical symbols listed below along with their intended interpretations, 
where boldface denotes the formal symbol while lightface denotes the “real” 
(metamathematical) symbol: 


(1) S (successor): $°' = S, where S(x) =x + 1 forall x e N 
(2) + (addition); +% = + 
(3) X (multiplication): x= x 
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(4) < (less than); <"*=< 
(5) O(zero): 0 =0 


The abbreviation 7 is pronounced the numeral n, and it stands for 


S...S0 
—{\—’ 


n of them 


As they are metamathematical abbreviations, we are not using boldface type 
for numerals. 


1.8.2 Definition. A theory I (this names the set of nonlogical axioms) over Ls; 
is correct over Nt just in case. 4 € T implies Ey .4. 

That is, all its nonlogical axioms are true in Nt (or really true, if St happens 
to be the intended model). 


The term correct is used by Smullyan (1992). Some authors say “sound”, but 
this is not as apt a terminology, for sound means something else: All first order 
theories are sound, but some theories over Ly; — although sound — may fail to 


be correct. © 


1.8.3 Definition. A theory { over some language L is simply complete, or just 
complete, iff, for all sentences .4 over L, we have at least one of F< .4@ and 
be 3%, 

It is simply incomplete, or just incomplete, otherwise. An incomplete theory 
thus fails to decide at least one sentence .4 over L, that is, neither; .4 nor 
Fe 4 holds. 

Such an .4 is called an undecidable sentence. 


Pause. Why “sentence”? Why not define the above concepts (complete, etc.) 
in terms of arbitrary formulas over L? 


Thus, in the case of an incomplete theory and for any particular one of its 
models — including the intended one — there is at least one sentence of the 
language which is (Tarski-)true in said model, but is not provable. Such is any 
undecidable sentence .4, for it or —.4 must be true in any given model. 

An inconsistent theory is complete, of course. © 


1.8.4 Definition. A theory { in the language of Peano arithmetic, Lm, is 
w-consistent just in case there is no formula .4(x) over Ly such that all of 


I.8. The Incompleteness Theorems 89 


the following hold: 


by =4(n) for alln ce N 


andl g (4x). 4(x). Otherwise it is w-inconsistent. 


An w-consistent theory fails to prove something over its language; thus it is con- 
sistent. The converse is not true, a fact first observed by Tarski. This observation 
is a corollary of the techniques applied to prove Gédel’s (first) incompleteness 
theorem (see our companion volume for the full story). © 


We can now state: 


1.8.5 Theorem (First Incompleteness Theorem, Semantic Version). Any 
correct extension of formal Peano arithmetic, effected in such a manner that 
the new set of axioms remains recognizable, will fail to prove at least one really 
true sentence. 

It follows that any such extension is a simply incomplete theory. 


By “a set A is recognizable” we mean that we can solve the membership prob- 
lem, “x € A?’, by algorithmic, or mechanical, means. That is, in our case here, 
we can test any formula and find out, in a finite number of steps, whether it is 
an axiom or not. The technical term is recursive, but we do not intend to get 
into that here.’ 

The first word in the theorem is very important: “any”. It shows that the 
theory (Peano arithmetic) is not just incomplete (take the trivial extension that 
adds nothing) but, indeed, incompletable: For, add to Peano arithmetic one 
really true sentence that it fails to prove. This effects an extension that is correct 
(why?) and constitutes a recognizable set of axioms. Repeat now, adding a 
really true sentence that this theory cannot prove. And so on. 

In particular, this says that each of these extensions misses not one but in- 
finitely many really true sentences (after all, we are effecting an infinite sequence 
of extensions; after each extension there are infinitely many more to go). © 


Why is Gédel’s theorem true? The idea (in Gédel’s original proof) is very 


old, based on games ancient Greek philosophers liked to play: The so-called 


T A fair amount of recursion theory is covered in volume 1, Mathematical Logic, where, in partic- 
ular, recursive sets are defined and studied. 
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“liar’s paradox”. Through an ingenious arithmetization of the language Gédel 
managed to construct a sentence ¥ whose natural interpretation said “I am not 
a theorem”. 

Let us see then if Peano arithmetic (or a correct and recognizable extension!) 
can prove -¥. Well, if it can, then — by correctness and soundness’ — ¥ is really 
true, i.e., itis not a theorem. This contradicts what we have just assumed. 


So it must be that ¥ is not a theorem. 


But then, ‘ is really true, for it says just that. We found a true sentence, YY, 
that is not provable. 

We have more. Since the theory is correct (and sound), and —% is really 
false, this latter sentence is not provable. Thus the theory (as extended) is simply 
incomplete; ¥ is undecidable. 


Where have we used, in the above argument, the part of the assumptions that 
requires the set of nonlogical axioms to be recognizable? We actually did not 
use it explicitly, since our argument was too far removed from the level of detail 
that would exhibit such dependences on assumptions. 

Suffice it to say that, among other things, the assumption on recognizability 
prevents us from cheating — thus invalidating Gédel’s theorem: Why don’t we 
just add all the really true sentences to the set of axioms and form a complete 
extension of Peano arithmetic? Because the recognizability assumption does 
not allow this. Such an extension results to a non-recognizable set of axioms 
(cf. volume 1). 

There is another way to look at the intuitive reason behind the incompletable- 
ness phenomenon. This relies on results of recursion theory. Imagine beings 
who live in a world where set theorists call a set countable just in a case a 
mechanical procedure, or algorithm, exists to enumerate all the set’s members, 
possibly with repetitions. Such beings call any set that fails to be enumerable 
in this manner uncountable. Intuitively, in the eyes of the inhabitants of this 
world, this latter type of set has far too many objects. 

In this world the set of theorems of any extension of Peano arithmetic, by 
an arbitrary recognizable set of new axioms, is countable. The reason can be 
seen intuitively as a consequence of the recognizability of the set of nonlogical 


+ Attributed to Epimenides. He, a Cretan, said: “All Cretans are liars”. So, was his statement true? 
Gédel’s proof is based on a variation of this. A person says: “I am lying.” Well, is he, or is he 
not? 

The exact form of ¥ depends on the extension at hand. 

8 Soundness we have for free. Correctness guarantees the real truth of the nonlogical axioms. 
Soundness extends this to all theorems. 
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axioms. This property allows us to build systematically (algorithmically) an 
infinite list’ of all theorems. 


Digression. Here is how. To simplify matters assume that the alphabet of the 
language Ls is finite (for example, variables are really the strings 
v|...[v 
eee 
n+l 
denoting what we may call v,, for n > 0, built from just two symbols, “v” and 
“). 

We convert every proof into a single string by adding a new symbol to our 
alphabet, say #, which is used as a separator and “glue” — between formulas — 
as we concatenate all the formulas of a proof into a single string, from left to 
right. We will still call the result of this concatenation a “proof”. 

We now form two separate infinite lists, algorithmically. The first is the list of 
all strings over the alphabet of Lm, as the latter was augmented by the addition 
of #. This listing can be effected by enumerating by string length, and then, 
within each length group, lexicographically (alphabetically). 

The second list is built as follows. Every time a string A is put in the first 
list, we test algorithmically whether or not A is a proof. We can do this, for, 
firstly, we can recognize if it is of the right form, that is, 


A, #Ao#... #A 


where each A; is a nonempty string over Ly. 

Secondly, if it is of the right form, we can then check whether indeed A is a 
proof: Whether or not A; is the result of a primary rule of inference applied to 
A; (and possibly to A;) for somei < j (andk < j) can be determined from the 
form of the strings A;, A;, and A;. The same is true of whether A; € A or not. 
Finally the recognizability assumption means that we can also check whether 
or not A; is nonlogical. 

If (and only if) A passes the above test, i.e., it is a proof, then we add its last 
formula (the one to the right of the rightmost #) to the second list. 

Now, it turns out that in such a world the set of all really true sentences of 
arithmetic is uncountable (this is proved in volume 1). Thus, there are infinitely 
many really true sentences that are not provable, no matter which theory (that 


+ One can “build an infinite list algorithmically” is jargon that means the following: One has an 
algorithm which, for any n € N, will generate the nth element of the list in a finite number of 
steps. 

! We assume that we have fixed an alphabetical order of the finitely many symbols of our alphabet. 
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produces a countable set of theorems) we have constructed on top of Peano 


arithmetic.t © 


While Gédel worked with the ¥ that says “I am not a theorem”, his result 
was purely syntactic. We state it without proof below. 


1.8.6 Theorem (First Incompleteness Theorem, Syntactic Version). Any 
w-consistent extension of formal Peano arithmetic, effected in such a manner 
that the new set of axioms remains recognizable, has undecidable sentences, and 
thus is a simply incomplete theory. In particular, one can construct a formula 
F which says “Iam not a theorem of this theory”. This formula is undecidable. 


1.8.7 Remark. In Gédel’s proof simple (ordinary) consistency suffices to prove 
the unprovability of ‘7. w-consistency is called upon to prove that —-¥ is not a 
theorem either. © 


With a different “7” (let us call it Y”), Rosser extended I.8.6 to the following 
result: 


1.8.8 Theorem (Gédel-Rosser Incompleteness Theorem). Any (simply) con- 
sistent extension of formal Peano arithmetic, effected in such a manner that the 
new set of axioms remains recognizable, has undecidable sentences and thus is 
a simply incomplete theory. 


We already mentioned that w-consistency is strictly stronger than consistency. 
Similarly, it can be seen, once the details of the Gédel argument are laid out, 
that correctness is strictly stronger than w-consistency (cf. volume 1). © 


The second incompleteness theorem of Gédel is, more or less, a formalization 
of the first. In plain English, it says that one of the really true sentences that Peano 
arithmetic — or, for that matter, any consistent and recognizable extension — 
cannot prove is its own consistency. 


1 Ttis straightforward to see that if there were only finitely many really true sentences that the formal 
system missed, these could be put into a finite table T, which we can check for membership, 
trivially. But then, we have an algorithm that can check a formula for membership in the set union 
between the theory’s axioms and T (just search the table; if not found there, then search the set 
of nonlogical axioms). Thus, adding the formulas of T to the theory, we have an extension with 
a recognizable set of axioms. This new theory trivially has all the formulas in T as theorems. 
Hence it has all the really true formulas as theorems (T is all that the original theory missed), 
contradicting the fact that this set is uncountable, while the set of theorems is still countable. 
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This fact showed that Hilbert’s finitary techniques, in the metatheory, were inad- 
equate for his purposes: Intuitively, finitary techniques are codable by integers 
and therefore can be expressible and usable in formal Peano arithmetic. 

Now we have two conflicting situations: Hilbert’s belief that finitary tech- 
niques can settle the consistency (or otherwise) of formal theories has had as 
a corollary the expectation that Peano arithmetic could settle (prove) its own 
consistency (via the formalized finitary tools used within the theory). On the 
other hand, Gédel’s second incompleteness theorem proved that this cannot be 
done. 


1.8.9 Theorem (Gédel’s Second Incompleteness Theorem). Any (simply) 
consistent extension of formal Peano arithmetic, effected in such a manner 
that the new set of axioms remains recognizable, is unable to prove its own 
consistency. 


The detailed proof takes several tens of pages to be fully spelled out (cf. vol- 
ume 1). However, the proof idea is very simple: Let us fix attention on an 
extension .7 as above, and let “Con” be a sentence whose natural interpreta- 
tion (over Jt) says that .7 is consistent. Let also ¥ be the sentence that says “T 


am not a theorem of .7”’. 
Now, Gédel’s first theorem (partly) asserts the truth (over Jt) of 


Con> & (1) 
ie., “if .7 is consistent, then Y is true — hence, is not provable, for it says just 


that’. 


The quoted sentence above is correct, for w-consistency came into play only 
to show that Gédel’s was not refutable. This part of the first theorem is not 
needed towards the proof of the second incompleteness theorem. 


Imagine now that we have managed to formalize the argument leading to (1) 
so that instead of truth in 9t we can speak of provability in .7 :1 
-zCon> ¥ 


It follows that if #7 Con, thent.7 by modus ponens, contradicting the first 
incompleteness theorem. 


+ While this is in principle possible — to formalize the argument that leads to the truth of (1) — 
this is not exactly how one proves the deducibility of (1), and hence the second incompleteness 
theorem, in practice. 


© 


94 I. A Bit of Logic: A User’s Toolbox 


1.8.10 Remark. The contribution of Peano arithmetic is that it allows one 
to carry out Gédel’s arithmetization formally, and to speak about provability, 
within the formal theory. In particular, it allows self-reference.‘ 

Clearly, this machinery exists in all consistent (and recognizable’) extensions 
of Peano arithmetic. It also exists in formal theories that may not be, exactly, 
extensions but are powerful enough to “contain”, or, more accurately, simulate 
Peano arithmetic. Such a theory is ZFC set theory. Clearly it is not an extension, 
for the languages do not even match. However we can see that since ZFC is the 
foundation of all mathematics, in particular one must be able to do arithmetic 
within ZFC.5 

Thus the incompletableness phenomenon manifests itself in ZFC as well. In 
particular, ZFC has undecidable sentences (first incompleteness theorem), and 
it cannot prove its own consistency (second incompleteness theorem).‘ 


I.9. Exercises 


1.1. Prove that the closure of .7 = {3} under the two relations z = x + y and 
Z=x-—yis the set {3k :k € Z}. 

[.2. The pair that effects the definition of Term (1.1.5, p. 13) is unambiguous. 

1.3. The pair that effects the definition of Wff (1.1.8, p. 15) is unambiguous. 


1.4. With reference to I.2.13 (p. 26), prove that if all the gg and h are defined 
everywhere on their input sets (i.e., they are “total”), that is, .7 for h 
and A x Y’ for gg and (r + 1)-ary Q, then f is defined everywhere on 
C17, .#). 

1.5. Prove that for every formula .4 in Prop (1.3.2, p. 29) the following is 
true: Every nonempty proper prefix (1.1.4, p. 13) of the string A has an 
excess of left brackets. 


1 Briefly, imagine that through arithmetization we have managed to represent every formula, and 
every sequence of formulas, of Ls; by a numeral. Gédel defined a formula S (x, y) which “says” 
that the formula coded x is provable by a proof coded y. Self-reference allows one to find a 
natural number 7 such that the numeral 7 codes the formula —(Ay)/ (7, y). Clearly, this last 
formula says that “the formula coded by 7 is not a theorem”. But it is talking about itself, for 7 
is its own code. In short, YY = =(Ay) AH, y). 

Recognizability is at the heart of being able to “talk about” provability within the formal 
theory. 


a 


wo 


More concretely, and without invoking faith, one can easily show that there is an interpretation, 
in the sense of Section I.7, of Peano arithmetic within ZFC. This becomes clear in Chapter V, 
where the set of formal natural numbers, w, is defined. 

The formal statement of the incompleteness theorems starts with the hypothesis “If ZFC is 
consistent”. 


—_ 


© 


1.6. 


1.7. 


L8. 
19. 


1.10. 
1.11. 
1.12. 
1.13. 


1.14. 
1.15. 


1.16. 


1.17. 
1.18. 
1.19. 
1.20. 
1.21. 
1.22. 


1.23. 


1.24. 
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Prove that any non-prime.~ in Prop has uniquely determined immediate 
predecessors. 


For any formula. and any two valuations v and v’, 0(.4) = v/(.4) if v 
and v’ agree on all the propositional variables that occur in. 7. 
Prove that .4[x < ft] is a formula (whenever it is defined) if ¢ is a term. 
Prove that Definition 1.3.12 does not depend on our choice of new vari- 
ables Z,. 
Prove that F (Vx)(Vy).4 <> (Vy)(Vx).4. 
Prove 1.4.23. 
Prove 1.4.24. 
(1) Show that x < y F y < x (< is some binary predicate symbol; the 
choice of symbol here is meant to provoke). 
(2) Show informally that x <y—> y<x 
(Hint. Use the soundness theorem.) 
(3) Does this invalidate the deduction theorem? Explain. 


Prove I.4.25. 


Suppose that | ¢; = s; fori = 1,...,m, where the ¢;, s; are arbitrary 
terms. Let.¥ be a formula, and.’ be obtained from it by replacing any 
number of occurrences of ¢; in.Y (not necessarily all) by s;. Prove that 


a 


ThKF =F’. 


Suppose that | ¢; = s; fori = 1,...,m, where the ¢;, s; are arbitrary 
terms. Let r be a term, andr’ be obtained from it by replacing any number 
of occurrences of ft; in r (not necessarily all) by s;. Prove that 0 Fr =r’. 


Settle the “Pause” following I.4.21. 

Prove 1.4.27. 

Prove thatk x=y—> y=x. 

Prove that x =yAy=z7>x =z. 

Prove (semantically, without using soundness) that.4 — (Vx). 


Suppose that x is not free in .4. Prove that .4 — (VWx).4 and 
F (Ax).4 > 4, 


Prove the distributive laws: 
F (Wx).4 A.B) < (VWx).4 A(Vx)Z and 
F (Ax)(4 V.B) <= (Ax).4 V (AX).B. 


Prove + (Ax)(Vy).4 > (Vy)(Ax).4 with two methods: first using the 
auxiliary constant method, next exploiting monotonicity. 


96 
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Prove F (Ax)(.4 => (Vx).4). 


In what follows let us denote by A, the pure logic of Section 1.3 (1.3.13 
and 1.3.15). Let us now introduce a new pure logic, which we will call A>. 


This is exactly the same as Aj, except that we have a different axiom group 


Axl. 


Instead of adopting all tautologies, we only adopt the following four 


logical axiom schemata of group Ax1:! 


) .4V.4>.4 

(2). 45>. 4V BB 

(3). 4@V. B> BV .A 

(4) (47> BB) (EV 4> CV #) 


Az is due to Hilbert (actually, he also included associativity in the axioms, 
but, as Gentzen has proved, this was deducible from the system as here given; 


therefore, it was not an independent axiom — see Exercise I.35). In the exercises 
below we write F; for ,,,i = 1, 2. 


1.26. 


Show that for all.¥ and set of formulas I, if [ F2.¥ holds then so does 
Try, F. 


Our aim is to see that the logics A; and A> are equivalent, i.e., have exactly the 


same theorems. In view of the trivial Exercise 1.26 above, what remains to be 


shown is that every tautology is a theorem of Az. One particular way to prove 
this is through the following sequence of A»-facts. 


1.27. 


1.28. 
1.29. 
1.30. 
1.31. 
1.32. 


1.33. 


Show the transitivity of > in Ao: 


4 +> 2B, B > C+. A> © for all.4,.2, and %. 


Show that 2.4 > .4 (e., ky 4 V4) for any 4. 
For all .4, .% show that ..4 > .2v.-A. 

Show that for all. 4 and .7,.4 12.2 > 4. 

Show that for all 4, 2 -7.4 > -4@ andl) .4 > -=—.4. 


For all. 4 and.#, show that. (.4 > .7) — (7# > =.4). Conclude 
that.Z — Bhi AB > 74. 

(Hint. 2.4 > 77.4.) 

Show that.4 > .#241,(2 > @)> (4 —-> @) forall .4,.%, &. 


+ sand v are the primary symbols; —, A, < are defined in the usual manner. 
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1.34. 


1.35. 


1.36. 


1.37. 
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(Proof by cases in Az.) Show for all 4, .7, @, Y, 
> BLE > D2 4VNEG > BNG 


Show for all 4, .7, @ that 

()k2.4V~C@VE)>C4V.Z)V € and 

(Qhr4V.B)VE> AV(BV GE). 

Deduction theorem in “propositional” A». Prove that if T,.4 2 .2 
using only modus ponens, then also T 2.4 > .% using only modus 
ponens, for any formulas .4, .# and set of formulas I. 

(Hint. Induction on the length of proof of .2 from I U {.4}, using the 
results above.) 


Proof by contradiction in “propositional” A>. Prove that if T, 7.4 
derives a contradiction in Ay using only modus ponens,' then T F2 .4 
using only modus ponens, for any formulas .4 and set of formulas I. 
Also prove the converse. 


We can now prove the completeness theorem (Post’s theorem) for the “propo- 
sitional segment” of Ao, that is, the logic, A3 — so-called propositional logic 
(or propositional calculus) — obtained from Az by keeping only the “proposi- 
tional axioms” (1)—(4) and modus ponens, dropping the remaining axioms and 
the 4-introduction rule. 


Note. It is trivial that if [ F3 .4, then IT F2.4. 


Namely, we will prove that, for any.4 and I, if l Eqaut 4, then I 3 4. 
First, a definition: 


1.9.1 Definition (Complete Sets of Formulas). A set [is complete iff for 


every .%, at least one of .4 or —.4 is a member of I’. 


1.38. 


Let I I; .4. Prove that there isacomplete A > T such that also A 1/3 .4. 
This is a completion of T. 
(Hint. Let.¥),.F ,,%,,... be an enumeration of all formulas. There is 
such an enumeration, right? 
Define A,, by induction on n: 
Ag =T 

Retires An ULF ,} if A, U{F,,} 434 

er eA {-F,} otherwise 


+ That is, it proves some .7 but also proves 3.7. 
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1.39. 


1.40. 
1.41. 
1.42. 
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To make sense of the above definition, show the impossibility of having 
both A, U {F%,} 3.4 and A, U {7.7} 3 4. Then show that A = 
U,,>0 An is as needed.) 


(Post.) If lf E.4, then 3.4. 

(Hint. Prove the contrapositive. If [ 43 .4, let A be a completion 
(Exercise I.38) of I such that A |’; .4. Now, for every prime formula 
(cf. 1.3.1, p. 29) FY, exactly one of F or =F (why exactly one?) is in A. 
Define a valuation (cf. 1.3.4, p. 30) v on all prime formulas by 


0 ifPFeaA 
1 otherwise 


VP) = 


Of course, “0” codes, intuitively, “true”, while “1” codes “false”. 

To conclude, prove by induction on the formulas of Prop (cf. 1.3.2, p. 29) 
that the extension of v, U, satisfies, for all formulas .7, 0(.7) = 0 iff 
2B € A. Argue that. 4 ¢ A.) 

If Eqaut 7, then TF .4. 

For any formula.¥ and set of formulas, -,.F% iff lf F2.F. 
Compactness of propositional logic. We say that I’ is finitely satisfiable (in 
the propositional sense) iff every finite subset of I is satisfiable (cf. 1.3.6, 
p. 31). Prove that I is satisfiable iff it is finitely satisfiable. 

(Hint. Only the if part is non-trivial. It uses Exercise 1.39. Further hint: If 
I’ is unsatisfiable, then T Eqaut.4 A —.4 for some formula. 7.) 


Il 


The Set-Theoretic Universe, Naively 


This volume is an introduction to formal (axiomatic) set theory. Putting first 
things first, we are attempting in this chapter to gain an intuitive understanding 
of the “real” universe of sets and the process of set creation (that is, what we 
think is going on in the metatheory). After all, we must have some idea of what 
it is that we are called upon to codify and formally describe before we embark 
upon doing it. 

Set theory, using as primitives the notions of set (as a synonym for “collec- 
tion’), atom (1.e., an object that is not subdivisible, not a collection), and the 
relation belongs to (€), has sufficient expressive power to serve as the foun- 
dation of all mathematics. Mathematicians use notation and results from set 
theory in their everyday practice. We call the sets that mathematicians use the 
“real sets” of our mathematical intuition. 

The exposition style in this chapter, true to the attribute “naive”, will be rather 
leisurely to the extent that we will forget, on occasion, that our “Chapter 0” 
(Chapter I) is present. 


II.1. The “Real Sets” 


Naively, or informally, set theory is the study of collections of “mathematical 
objects”. 


II.1.1 Informal Description (Mathematical Objects). Set theory is only in- 
terested in mathematical objects. As far as set theory is concerned, such objects 


1 Ttis our experience that readers of books like this one often choose to ignore “Chapter 0” initially. 
Invariably they are compelled to acknowledge its existence sooner or later in the course of the 
exposition. This will probably happen as early as Chapter ITI in our case. 


99 


100 II. The Set-Theoretic Universe, Naively 


are either 


(1) atomic — let us understand by this term an object that is not a collection of 
other objects — such as a number or a point on a Euclidean line, or 
(2) collections of mathematical objects. 


The foregoing description of “mathematical object” is inductive — describing 
the notion in terms of itselfi — and, as all inductive descriptions do, it implies 
a formation of such objects, from the bottom up, by stages (cf. 1.2.9). That 
is, we start with atoms.t We may then collect atoms to form all sorts of first 
level collections, or sets as we will normally say. We may proceed to collect 
any mix of atoms and first-level sets to build new collections — that is, second 
level sets — and so on. Much of what set theory does is to attempt to remove the 
fuzziness from the foregoing description, and it does so by logically developing 
the properties of these sets. 


11.1.2 Example. Thus, at the beginning we have all the level-0, or type-0, 
objects available to us. For example, atoms such as 1, 2, 13, /2 are available. 
At the next level we can include any number of such atoms (from none at all 
in one extreme, to all available atoms in the other extreme) to build a set, that 
is, anew mathematical object. Allowing the usual notation, i.e., listing within 
braces what we intend to include, we may cite a few examples of level-1 sets: 


L1-1. {}. Nothing listed. This set has the standard notation @, and is known as 
the “empty set’. 

L1-2. {1}. 

L1-3. {1, 1}. 

L1-4. {1, 2}. 

L1-5. {/2, 1}. 


Pause. Are the sets that we have displayed under L1-2 and L1-3 the same? 
(I mean, equal?) Same question for the sets under L1-4 and L1-5. Our “un- 
derstanding” is — gentle way of saying “we postulate” — that set equality is 


— 


Taking for granted an understanding of the terms “atom” and “collection” as intuitively self- 
explanatory, we use them to describe the objects that set theory studies. We are purposely leaving 
out a description of what “mathematical” is supposed to mean. Suffice it to say that experience 
provides numerous examples of mathematical objects, such as numbers of all sorts, points, lines, 
vectors, matrices, groups, etc. Of course, one needs an experiential understanding of atomic 
mathematical objects only, since all the others are built from those as described in II.1.1. 
Atoms are very often called “urelements”, pronounced “tir-élements” — an anglicized form of the 
German word Urelemente — “primeval elements”. 
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“forgetful of structure” such as repetition or permutation of elements. This un- 
derstanding will soon be formally codified by choosing an appropriate axiom 
for set equality. 


We already can identify a few level-2 objects, using what (we already know) 
is available. 


ce Note how the level of nesting of { }-brackets matches the level of the objects. © 


L2-1. {9}. 
L222, 1, Ay 
L2-3. {{/2, 1}}. 


11.1.3 Informal Definition. A set is a non-atomic mathematical object, as the 
latter is described in II.1.1 (p. 99). 


The above is not a mathematical definition, because it is not precise. It is only 
an understanding on which we will subsequently base our choice of axioms. 
We do not need to attempt to search for the “real, definitive ontology” of sets 
(whatever that may mean) in order to do set theory, any more than we bother to 
search for the real ontology of “number” or “point” before we allow ourselves 
to do number theory or geometry, respectively. 

From the mathematical point of view we are content to have tools (axioms 
and rules of logic) that tell us how sets behave rather than what sets are — 
entirely analogously with our attitude towards points and lines when we do 
axiomatic geometry, or towards numbers when we do axiomatic arithmetic 
(see, for example, our development of Peano arithmetic in volume 1 of these 


lectures). © 


Nevertheless, we will accept throughout this volume the previous (inductive) 
intuitive description of sets (II.1.3), doing so not because of some deep philo- 
sophical conviction, but in the sense that we will let this accepted‘ ontology 
guide us to choose reasonable axioms. 


¥ Itcannot be emphasized strongly enough that “accepted” is a very important verb here. Different 
descriptions/ontologies of sets may be possible — for example, one that denies Principle | below. 
Compare with the similar situation in geometry. It is possible to imagine different types of 
geometry — Euclidean on one hand, and various non-Euclidean ones on the other — but one is free 
to say “TI will accept Euclidean geometry as the ‘true’ depiction of the universe and then proceed 
to learn its theorems”. All that the latter acceptance means is a decision to study a particular type 
of geometry. 
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For this process to be effective we have to understand some of the fine points 
of IL.1.3. Thus we begin by “unwinding” the induction into an iteration. We 
obtain the following two principles of set formation that are taken as “obvious 
truths”: 


Principle 0. We can form, or build, sets by stages as follows: At stage 0 we 
acknowledge the presence of atoms. At each subsequent stage we may form 
a mathematical object — a set — by collecting together (mathematical) objects 
provided these are available to us from previous stages. 


Principle 0 is worded so that it leaves open the possibility that there are some 
sets that are obtained outside this formation process. However, our accepted 
inductive definition of sets (II.1.3) requires the following as well: © 


Principle 1. Every set is built at some stage. 


© 11.1.4 Remark. Principle 1 is too strong. Omitting it does not affect the ap- 
plicability of set theory to mathematics, i.e., the status of the former as the 
“foundation” of the latter. Of course, we cannot omit this principle unless we 
modify the descriptions II.1.1 and II.1.3 (for reasons analogous to the pheno- 
menon described in I.2.9). 
Now, if Principle 1 holds, as it does under our assumptions, then it leads 
to the foundation axiom. This comment will make much more sense later. For 
now, if you have just read it you have done so at your own risk. oe 


The following subsidiary (and delightfully vague) principle is important 
enough to be listed: 


(Subsidiary) Principle 2. If our intuition will accept the existence of a stage 
(let us call it &) that follows all the (earliest) stages of construction (as a se?) 
of each non-atomic member of some collection A, then A is a mathematical 
object, and hence is a set (A is not atomic, being a collection). The reason: By 
invoking Principle 0 we can built A at stage D. 


+ Not less “obvious” than II.1.3, from which they follow directly. The reader may peek once more 
into 1.2.9 for motivation, forewarned though that the stages of set formation are “far too many” 
to be numbered solely by natural numbers. 

By the way, we do not normally speak of formation of atoms. Atoms are given outright. It is 
sets that we build. 
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11.1.5 Remark. (1) We are not saying above that stage & is the “earliest” stage 
at which A can be built, since we have said “follows” rather than “immediately 
follows”. 

(2) If some set is definable (“buildable”) at some stage X, then we find 
it both convenient and intuitively acceptable to agree that it is also definable 
at any later stage as well. This corresponds to the common experience that a 
theorem has proofs of various lengths; once a “short” proof has been given, 
then — for example by adding redundant axioms in this proof — we can lengthen 
it arbitrarily and yet still have it yield the same theorem. 

(3) “If our intuition will accept...”. This condition in Principle 2 creates 
some difficulty. Whose intuition? What is acceptable to some might not be to 
others. 

This is a problem that arises when one does one’s mathematics like a 
Platonist. A Platonist accepts some “obvious truths” about mathematical ob- 
jects, and then proceeds to discover some more truths by employing (infor- 
mal) logical deductions. Most practising mathematicians practise their craft like 
Platonists (whether they are card-carrying Platonists or not). 

The catch with this approach, especially when applied to something “big” — 
by this I mean “foundational” — like set theory, is that one cannot always syn- 
chronize the understandings of all Platonists as to what are the “obvious truths” 
(about sets) — from where all reasoning begins to flow. There was a time not 
too long ago, for example, that mathematicians, otherwise comfortable with 
infinite sets, were not unanimous on whether the set-theoretic principle known 
as the axiom of choice was valid or not. 

In the end, we avoid this difficulty by adopting the axiomatic approach to 
set theory. The Platonist within each of us may continue thinking of the sets 
that were imperfectly described in I.1.3 as the “real sets” — the ones that, 
Platonistically speaking, “exist”. However, we plan to learn about sets by argu- 
ing like formalists. That is, we will translate a few obvious and important truths 
about real sets into a formal language (these translations will lead to our axiom 
schemata) and then employ first order logic as our reasoning tool to learn about 
real sets, indirectly, by proving theorems in our formal language.‘ 

Thus, once the imprecise set-formation-by-stages thesis has motivated the 
selection of the above-mentioned “few obvious and important truths”, it will 


1 The indirection occurs because in this language we will use terms to represent or codify real sets, 
and formulas to represent or codify properties of real sets. The reader who has read volume 1 is 
by now familiar with this approach, which we applied there in Chapter II to the study of (Peano) 
arithmetic. For terminology — such as “formal language”, “term”, “formula”, “metatheory” — and 
tools from logic, the reader is referred to Chapter I of the present volume. 
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never be invoked again. Indeed, the opposite will happen. Our axioms will be 
strong enough to precisely define (eventually) what stages are and what happens 
at each stage, something that we are totally powerless to do now. 


Another criticism of the Platonist’s approach to set theory is that it may entail 
contradictions (often called antinomies or paradoxes) which are hard to work 
around. Such paradoxes come about in the Platonist approach because it is not 
always clear what is a safe “truth” that we can adopt as a starting point of our 
reasoning. For example, is the following a “safe truth”? “For any property ‘74’ 
we can build a set of all the objects x that satisfy .4.” We look into this question 
in the next section. We also ponder briefly, through an example immediately 
below, the nature of set-building, or set-defining, “properties”. 


A bit on terminology here: Some people call the contradictions of naive set 
theory “antinomies” (e.g., the Russell antinomy), and the harmless pleasantries 
of the Berry type “paradoxes”. Others, like ourselves, use just one term, para- 
doxes. The reader may wish to decide for himself on the choice of terminology 
here, given that both words are rooted in Greek and a paradox is something 
that is “against one’s belief” or even “against one’s knowledge” (Sox@ = “T 
believe’, or, “I know’’) while antinomy means being “against the — here, logical 
or mathematical — law” (véjo¢ = “(the) law”). 

By the way, Berry’s paradox is this: Define n by “n is a positive integer 
definable using fewer than 1000 non-blank symbols of print”.' Examples of 
possible values of n: “5”, “10”, “10 raised to the power 350000”, “the smallest 
prime number that has at least 10 raised to the power 350000 digits”. 

Now, the set of such numbers is finite, since there are finitely many ways to 
write a definition employing fewer than 1000 non-blank symbols. Thus, there 
are plenty of positive integers that are not so definable. Let m denote the smallest 
such. 


Then “m is the smallest positive integer not definable using fewer than 1000 
non-blank symbols of print”. 

Hey, we have just defined m in less than 1000 non-blank symbols of print. 
A contradiction! © 


II.1.6 Remark. It should be pointed out that our Platonist’s view of “real sets” 
is informed by the work of Russell (and the later work of von Neumann), 


i There is an implicit understanding that the set of all available symbols of print is finite: e.g., 
nowadays we could take as such the set of symbols on a standard English computer keyboard. 

? Well, not really. Neither of the statements “n is a positive integer definable using fewer than 
1000 non-blank symbols of print” or “m is the smallest positive integer not definable using fewer 
than 1000 non-blank symbols of print” is a definition. What does “definable” mean? 
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namely, by his suggested “fix” for the paradox that he discovered — see next 
section. Georg Cantor, the founder of set theory, did not require any particular 
manner, or order, in which sets are formed. The axioms of the ZFC set theory 
of Zermelo and Fraenkel describe the von Neumann universe, which is built 
by stages, rather than the Cantorian universe. In the latter, as many sets can be 
present at once as our thought or perception will allow.! 


II.2. A Naive Look at Russell’s Paradox 


Let us ponder an elementary but fundamental example of what sort of contra- 
dictions might occur in the informal approach. 


11.2.1 Example.! Let us recall (from Chapter I or from our previous mathe- 
matics courses) that the notation 


S={x:.4[x]} (1) 


denotes (naively) the set S of all objects x that satisfy the formula A[x].! This 
means that entrance into S is determined by 


xeS iff Ax] (2) 


where, of course, by “x € S” we mean “x is a member of S”’. 
Let us see why the “Russell set” 


R={x:x€x} (3) 
is bad news for the informal approach: By (2), (3) yields 
xER iff xéx (4) 


Now, since the variable x can receive as value any object of the theory, in 
particular it can receive the “set” R. Thus, (4) yields the contradiction 


ReR iff RER (5) 
Our only way out of the contradiction (5) is to say that 


R is nota set.! 


¥ In Cantor’s own description, a set is any collection into “a whole” of objects of our “perception 
or of our thought”. 

* Reminder: This is at the informal level. 

8 The square and round bracket notation is introduced in 1.1.11. 

‘1 This saves the theory, for now, since then it is “illegal” to plug R into the set/atom variable x; 
hence (5) will not be derived from (4). 
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Here is what happened: We have obtained outrageously many x’s, each 
satisfying x ¢ x. We then decided to collect them all, and build a set R. Our 
blunder was that we did not verify that Principle 2 (p. 102) applied to R. 

No checking, no right to claim sethood for R! 

Thus, the fact that R is not a set is neither a surprise nor paradoxical. Ap- 
parently we have run out of stages. By the time all the x’s were built, there was 
no next stage left at which we could collect them all into a set R. 


You are shaking your head. But consider this: x € x has to be false for any 
object x. Indeed, it is trivially false for atomic x. For non-atomic x, in order to 
build the copy to the right of “e” I must first have (at an earlier stage’) the x to 
the left of “ce” (since it is a member of the collection x). 

Thus x ¢ x is true for all objects x. But then R contains everything, for the 
entrance condition in (3) is always true. No wonder there were no stages left to 
build R. We have used them all up building the x’s! 


Now that we realize that some collections such as S in (1) above are sets, and 
some are not, how can we tell which is which? The axiomatic approach resolves 
such issues in an elegant way. 


II.3. The Language of Axiomatic Set Theory 


Having taken our foregoing terse description of how sets are built — by stages — as 
our (Platonist) view of what sets really are, we now want to avoid embarrassing 
paradoxes and to turn the theory into a consistent deductive science. The obvious 
approach is to translate or codify naive set theory into a formal first order theory, 
in the sense of Chapter I. We begin by choosing a formal first order language, 
Lset- 


Lset has the standard logical symbols, namely, 


+ “Hmm”, the alert reader will say. “You are using Principle 1 here. You are saying that if x is a 
non-atomic mathematical object, then it must be built at some stage.” Indeed! However, even if we 
were to totally abandon Principle 1 and revise our naive picture of the universe of “mathematical 
objects” to allow x € x to be true (depending on the “value” of x), we could still avoid the 
Russell paradox argument in exactly the same way we avoid it in the presence of Principle 1: 
Namely, by restricting the circumstances where the “operation” {x : A[x]} is allowed to build 
a set. In short, it is not the choice of an answer to the question “‘x € x” that creates the Russell 
paradox, rather it is a comprehension principle, {x : A[x]}, that is far too powerful for its own 
good. 


© 
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and object variables, that is, variables that when interpreted are interpreted to 
take as values (real) sets or atoms,’ 


U0, U1, -++5 Vis--- 


Additionally, Ls, has the two primitive nonlogical symbols “e”! and “U”.5 
The former is a binary predicate that is intended to mean (when interpreted) “is 
a member of”. The latter is a unary predicate meant to say (of its argument) “is 
an atom’. All the remaining familiar symbols of set theory (e.g., ,U, C, x) 
are introduced as defined nonlogical symbols as the theory progresses. 


Of course, exactly as in Chapter I, one introduces, in the interest of convenience, 
defined logical symbols, namely, V, \, >, <. 


The logical axioms and rules of first order logic will be those that we have 
introduced in Chapter I. 

Our intended “standard model” — i.e., what we are describing by our formal 
system —is the already (imperfectly) described “universe” of all sets and atoms.‘ 
Having here a standard model in mind, which the axiomatic theory attempts 
to describe correctly and completely,” is entirely analogous to what we did in 
volume 1. 

There we had the standard model of arithmetic, St=(N, S,+, x, 0, <), 
in mind, and each of the axiomatizations introduced, ROB and PA, were suc- 
cessive attempts at formally deducing all the true formulas of St from a few 
axioms.| 


} This is the implementation of our intentions regarding the nature of “mathematical objects” 
(IL1.1). 

 “e” is a stylized form of ¢ (épsilon) the first letter of the ancient Greek word “eo ti” 

esti — with a short “i” — and meaning “‘is”). Thus, if y is the set of all even integers, x € y says 

that “x is an even integer’. Some authors still use x ¢ y instead of x € y, but we prefer not to do 

so, as ‘““e” is overused (e.g., empty string, epsilon number, Hilbert selector, a major contributor 

to the dreaded “e-5” proofs of calculus, etc.). 

“Primitive” means “primeval” or “given at the very beginning”. 

Known as the von Neumann universe. That this universe is not a set — it is equal to the Russell 

collection R granting Principle 1, is it not? — is an issue we should not worry about, as long as 

we accept that its members are all the sets and atoms. 

The terms correctness and (syntactic or simple) completeness of a theory are defined in 1.8.2 

and 1.8.3. The former means that every theorem is true when interpreted in the standard model. 

The latter means that all formulas that are true in the standard model are theorems. We have 

no difficulty with the former requirement. The latter is impossible by Gédel’s incompleteness 

theorems (I.8.5). 

| Again, we could not produce ail such formulas, because of Gédel’s incompleteness theorems. 


(pronounced 


= wm 


© 
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The choice of the intended model influences the choice of (nonlogical) 
axioms. We will adopt in this book the Zermelo-Fraenkel axioms (ZF) with 
the axiom of choice as an additional axiom. This system is known as ZFC. 


11.3.1 Remark. To an observer we will appear to behave like formailists, 
manipulating sets and their properties in a finitistic manner, writing proofs 
within a first order theory. 

Sets will be? just terms of our language, thus finite symbol sequences! Prop- 
erties of sets will also be finite objects, the formulas of the language. Finally, 
proofs themselves are finite objects, being finite sequences of formulas. 

We do not have to take sides or disclose where our loyalties lie — Platonist 
vs. formalist camp — as such disclosure is functionally irrelevant. What really 
matters is how we act when we form deductions.! © 


The definitions of terms and formulas for Lge, are those given in Chapter I 
(1.1.5 and 1.1.8) subject to the restriction that the only primitive nonlogical 
symbols are the two predicates € and U. 


© 11.3.2 Remark (Basic Language). Thus, the terms of Ls; are just the variables, 
V0, U1, V2,....- 

Formulas are built from the atomic ones, that is, Uv;, v; = v;, and v; € v; 

for all choices of i, j in N, by application of the connectives —, V, and J (1.1.8). 


We call Ls, the basic or primitive language of set theory. The qualifiers 
“basic” and “primitive” reflect the fact that the only nonlogical symbols are the 
primeval € and U. As the theory is being developed, we will frequently introduce 
new defined symbols, thus extending Ls (cf. Section 1.6). This process also 
enlarges the variety of terms (adding terms such as @, {x : —x = x}, a, etc.) 
and formulas. 

We note that the definitions of terms and formulas of Lge are strictly about 
syntax —1.e., correct form. Thus they do not concern themselves with semantic 
issues or provability issues. In particular, it is good form to write “vz € v2”, 
even if one of our axioms will entail that “v2 € v2” is a false statement.’ © 


i “Be” is used here in formalist jargon. The Platonist terminology is “be denoted by”. 

= A true formalist would probably declare that the sets of our intuition do not really “exist” — 
mathematically speaking — and sets just are the terms of our formal language. See Bourbaki 
(1966b, p. 62), where it is stated, in translation, that “[ . . . ] the word ‘set’ will be strictly considered 
to be a synonym for ‘term’; in particular, phrases such as ‘let x be a set’ are, in principle, totally 
superfluous, since every variable is a term; such phrases are simply introduced to help the intuitive 
interpretation of [formal] texts”. 

8 We have already remarked in II.2.1 that x € x is false in our intended universe. 
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11.3.3 Remark (Notational Liberties). In practice we use abbreviations — 
in the metalanguage — in order to enhance readability. The reader may wish 
to review the metalinguistic argot introduced in Chapter I, in particular the 
agreement that calligraphic upper case (Latin) letters stand for formulas (see 
Remark I.1.9 for more on this) while t,s,r typically are metasymbols for 
arbitrary terms. 


We also take liberties with the correct syntax of formulas and terms, writing 
them down in abbreviated more readable form. One type of abbreviation has to 
do with reducing the number of brackets that we use when we write formulas. 
This has been discussed in Chapter I. 

We also have metalinguistic abbreviations for the variables we use. Instead 
of the cumbersome v 234777, Voo, etc., we adopt the convention that any lower or 
upper case single Latin letter, with or without subscripts or primes, will denote 
an object variable. 


We will prefer to name variables using letters near the end of the alphabet. 
Nevertheless, we will often introduce variables such as A, b, c, or even go to 
Greek and German (Fraktur) alphabets to obtain names for variables, such as 
a, B and m, €. 

We will almost never write down a well-formed formula of set theory (except 
for the purpose of mocking its unfriendliness and awkwardness). We will prefer 
“translations” of the formula in our argot, where abbreviations of various sorts, 
and natural language, are allowed. This renders the formula easier to read and 
comprehend. 


11.3.4 Example. Picking up the last comment above, we show here two exam- 
ples of what the judicious use of English saves us from: 


(a) We would sooner say “n is a natural number” than write the set theory 
formula 


(Wx)\(Vy(xEeyenrmxeEn)a 
[In =OvV Ax/-U(x)An=xU{x}J A 


(¥m)| m en {(Vx)(Vy)@ eyem>xemynr 
[m = BV (Ax\U(x) Am =xU ii} ] 


It should be noted that the above is already abbreviated. It contains the 
defined symbols 4, U and {x}, not to mention that the variables used were 


+ “Abbreviated” is not always shorter. +x x yz is shorter than x + (y x z), and ft} f2¢3 is shorter than 
J (t1, ta, #3). Yet the longer forms are easier to understand. An abbreviation here is an alternative, 
easier to understand form. 
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written in argot and that we employed logical abbreviations such as —, V, 
etc., and brackets of various shapes. 

(b) If we are in number theory (arithmetic) we would sooner state “n is a prime”’, 
than 


n> SOA (Wx)\(Vy)(n =x x y>x=SOVx =n) © 


II.4. On Names 


The reader is referred to Section I.1 (in particular see the discussion starting 
with Remark I.1.3 on p. 10) so that we will not be unduly repetitive in the present 
section. 


11.4.1 Remark (The Last Word on “Truth’’). The completeness theorem 
shows that the syntactic apparatus of a first order (formal) logic totally captures 
the semantic notion of truth, “modulo” the acceptance as true of any given assu- 
mptions, I’. This justifies the habit of the mathematician (even of the formalist — 
see Bourbaki (1966b, p. 21)) of saying — in the context of any given theory I" — 
“itis true”, meaning “it is a [-theorem”, or “it is [-proved”; “it is false’, mean- 
ing “the negation is a '-theorem”; “assume that .4 is true”, meaning “add the 
formula .4 — to P — as a nonlogical axiom”; and “assume that . 7 is false”, 


meaning “add to I the formula —. 4, as a nonlogical axiom”. 


There is another meaning (and use) of “true” which is not equivalent to 
deducibility. This is what we have called the “really true”, meaning what is true 
in the intended, or standard, model. 

The Gédel incompletableness phenomenon tells us that strong theories like 
set theory or arithmetic will never’ have deducibility coincide with “real truth’’. 
This is because there will always be sentences .# that are neither provable nor 
refutable — but one of them surely is “really true”! 

We plan to abandon the qualifier “real” (as we promised in an earlier footnote) 
and the quotes around true (in the standard model). To avoid confusion with 
the “other” true (= deducible) we will do the following: 


Whenever we mean “is proved” or “is provable”, we just say so. We will not 


say “is true” instead. © 


It will be convenient (and it is standard practice) to use the symbol sequences 
that are terms of the formal theory as names for their counterparts, real sets of the 


+ “Never” as long as all consistent augmentations of the set of axioms preserve the set’s 
recursiveness — or “recognizability”. 


II.4. On Names 111 


metatheory. For example, in the metatheory we may say “the set {x : ~x = x}’, 
thus using the symbol sequence “{x : =x = x}” to name some appropriate real 


set 


, the so-called empty set.i 


This correspondence between certain terms! of the type “{x :.4[x]}’ and real 
sets is nothing else than an application of first order definability (cf. 1.5.15). 
That is, if some real set A is first order definable in the standard model by a 
formula .4, we have 


xé€A iff .4(x) is true in the standard model 


Thus the symbol sequence .4, or more suggestively the symbol sequence 


{x 


: ~G(x)}, can name the set A. As we know, the latter sequence is pronounced 


“the set of all x such that .4(x) is true”. 


Reciprocally, in our argot, we nickname formal terms and their formal ab- 


breviations by the metamathematical (often English) names of the sets that 
they name. Thus, e.g., we say that {x : —x = x}, or J, is “the empty set of the 
(formal) theory”. 


We note two limitations of this naming apparatus below. 


11.4.2 Remark (Limitations of our Naming Mechanism). 


(a) 


(b) 


i 


Inconvenience. This stems from the fact that formal terms, even for very 
simple sets, can be horrendously long, and thus can be quite incomprehen- 
sible. For this reason we almost always introduce, via formal definitions, 
short names for such terms (formal abbreviations) that we just make up — 
that is, we name the formal names by more convenient (shorter) names 
that we invent. These shorter defined names become part of the formal lan- 
guage of the formal theory. For example, the term {x : ~x = x} is formally 
abbreviated by a new (defined) constant symbol, J. 

Formal limitations. First, we cannot name — that is, first order define — 
all the real sets by terms, because there are far more sets than terms that 
we can supply via the formal language. We cannot even so define all the 
subsets of the set of natural numbers. As a consequence, we cannot codify 
all “properties” of sets in our language as formulas either, because there are 
far too many properties but too few formulas.’ Second, as if the short supply 


am guilty here of borrowing from the sequel. “{x : =x = x}” is not a term of the basic 


language II.3.2; instead it is a defined term, about which we will talk soon. 

 Russell’s paradox is fresh in our memory; thus, “certain” is an apt qualifier. 

8 We cannot gloss over this shortage of names by extending Lge, by the addition of a name for 
each real set. That would make our language impractical, as it would make it uncountable, and 


© 
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of formulas were not limiting enough, Gédel’s first incompleteness theorem 
yields another insurmountable limitation. It tells us that in any consistent 
axiomatization of set theory through a recognizable set of axioms there will 
be infinitely many “true properties” of sets for each of which we do have a 
formal name, but nevertheless none of these formal names (formulas of Let) 
are deducible in the formal theory. Thus the theory can only incompletely 
capture “real truth”. 


Unlike the limitation of convenience (a), which we have easily circumvented 
above, there is no solution for (b). 

But then, why bother with a formal theory at all? I can state two reasons. 

The first is the precision that such a theory gives to the concept of deduction, 
turning it into a mathematical (finite) object. Thus, questions such as “is our 
set theory free from contradiction?”! or “what is, and what is not, deducible 
from what axioms?” become meaningful and can be handled mathematically, 
in principle. 

The above are metatheoretical concerns. The second reason has to do with 
everyday mathematical practice. We benefit from the precision that a formal 
theory gives to the praxis of deduction, guarding us against embarrassing para- 


doxes that loose arguments or loose assumptions may lead to. 


11.4.3 Example. This, like most examples, is in the “real (informal) realm”. 
The natural numbers 0, 1, 2, ... when collected together form a set, normally 
denoted by N. 


We often capture the above sentence informally by writing N = {0, 1,2, ...}. 


11.4.4 Remark. N is a remarkable example of a (real) set, in that we have no 
easy way to give a term name for it in set theory. The next best thing to do is, 


instead, to find another real set, w, “isomorphic to N”’,! that can easily be seen 
to have a term counterpart in the formal theory. 

Needless to say, as follows from our previous discussion, both the real w and 
the corresponding formal term are denoted by the same symbol, w. 


therefore impossible to generate finitely. In an uncountable language we will not be able to write, 
or even check, proofs anymore, as we will have trouble telling what symbols belong to Let and 
which do not. As a result, we will be unable to know whether an arbitrary string of symbols is a 
formula, an axiom, or just rubbish. 

+ This is the original reason that prompted the development of axiomatic theories. 

= The reader should not worry about the meaning of “isomorphic”. We will come back to this very 
issue in Chapter V. 
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Notwithstanding the comment regarding N, we will continue employing it, 
as well as other familiar sets from the metatheory (such as Z (the integers), Q 
(the rationals), and R (the reals)) in our informal discussions, i.e., in examples, 
remarks, “naive” exercises, etc. © 


11.4.5 Remark. At the outset of Section II.3 we promised “to turn the theory 
into a consistent deductive science”. 

It may come as a shock to the reader that we have no (generally acceptable) 
proof of consistency of ZFC. We Platonistically got around the consistency 
question of either ROB or Peano arithmetic by saying “sure they are consistent; 
SX is a model of either”, since few reasonable people will feel uncomfortable 
about Nt or its fitness to certify consistency (serving as a model). Notwith- 
standing this, proof theorists have found alternative constructive proofs of the 
consistency of Peano arithmetic and hence of ROB (such proofs can be found 
in Schiitte (1977) and Shoenfield (1967)). These proofs necessarily use tools 
that are beyond those included in Peano arithmetic (because of Gédel’s second 
incompleteness theorem). 

We have no such constructive proof of the consistency of ZFC. This, of 
course, is not surprising. Since ZFC satisfies Gédel’s second incompleteness 
theorem, a proof of its consistency cannot be formalized within ZFC. Here then 
is the difficulty: What will any such consistency proof “outside” or “beyond” 
ZFC look like, considering that 


(a) it cannot be expressed in ZFC, and yet 

(b) ZFC, being the “foundation of all mathematics” (or such that “mathematics 
can be embedded” in it), ought to be able to include (formalizations of ) all 
mathematical tools and mathematical reasoning — including a formalization 
of its consistency proof that was given “outside” ZFC. 


However, most set theorists are willing to accept the consistency of ZFC. 
“Evidence” (but not a proof) of this consistency is, of course, the presence of 
the standard model. 


Ul 


The Axioms of Set Theory 


IIL1. Extensionality 


Under what conditions are two sets equal? 


First of all, if a and b stand for urelements, then a = b just obeys the logical 
axioms of equality (Definition I.3.13, p. 35) and we have nothing to add about 
their behaviour concerning equality. © 


For sets, however, we require that they be equal whenever they contain 
exactly the same elements, regardless of whatever “structural connections” 
these elements may have. In order to state this axiom formally we use the 
primitive predicate of set theory, U. Thus Ux is intended to mean “x is an 
urelement” (therefore ~Ux will mean “‘x is a set’). 

We use the “abbreviation”? “U(x)” for “Ux”, since it is arguable that, in 


” 


general, “P(t,;,...,t¢,)” is more user-friendly than “Pt, ...1,”. 


III.1.1 Axiom (Extensionality). 
=U(A) A =U(B) > ((vx)(x cAGxEB)O A= B) (E) 
In words, for any sets A and B, if they have the same elements, then they are 


equal. 


IlI.1.2 Remark. We noted that the above axiom, (£), indicates that we want 
two sets to be equal as long as they have the same elements, regardless of 
the existence of inner structure in the sets (such as one dimensional or higher 


+ We have already remarked in a footnote on p. 109 that an “abbreviation” is meant to create 
easier-to-read text, not necessarily shorter text. 


114 


II.1, Extensionality 115 


dimensional order) and regardless of “intention”, that is, how the set originally 
came about. For example, the set that contains the integers 2 and 3 is expected 
to be the same as (equal to) the set of all roots of x — 5x + 6 = 0, despite the 
difference in the two descriptions. That is, we have postulated that set equality 
is “forgetful of structure”. 

It is the extension of a set (i.e., its actual contents) that decides equality, 
hence the name of Axiom III.1.1. 


But is this axiom “true’’?* Is this the condition that governs equality of “real” 
sets? Well, formal or axiomatic mathematics aims at representing reality within 
an artificial but formal and precise language. In this “representation” there is 
always something lost, partly due to limitations of the formal language and 
partly due to decisions that we make — regarding the choice of our assumptions, 
or axioms — about what features of “reality” are essential (of which we create 
counterparts in the formal language) and which are not. 

For example, a “real” line has width no matter how you construct it, but 
geometers have decided that width is irrelevant, so they invent their lines so 
as to have no width. In our case, we are saying that to decide set equality we 
forget all attributes of sets other than what elements they contain. This is what 
we deem to be important. 

Now that we have defended our choice of (£), another question arises: Js 
(E) a definition? Much of the elementary literature on the theory of sets takes 
the point of view that it is (see Wilder (1963, p. 58), for example), although 
often somewhat casually. 

A formal definition would introduce the symbol “=” by (£), if the symbol 
were not part of our “logical list” of symbols. Since we already have “=” and 
its basic axioms, (£), for us, is an axiom.? 


Note that in the extensionality axiom we state no more than what we need — 
following the mathematician’s known propensity to assume less in order to have 
the pleasure of proving more. This accounts for using “..- — A = B” rather 
than “--- < A = B”’ In fact, we have 


L 4U(A) A -U(B) > (4 =B>(Wx\(xEeAoxe B)) (1) 


where “+” indicates provability without using any nonlogical axioms. 


+ Remember that while we cannot give a proof of consistency of ZFC, we can at least check that its 
axioms are “really true”, i.e., true in the standard model. This checking is done on the informal 
level, of course. 

t In fact, a formal definition is still an axiom, via which a new formal symbol is introduced — as 
we saw in Section I.6. But this is not the case with (£). 
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To see this, note that 
FA=Bo(xE Axe B) 


by equality axioms. Then, since A = B has no free x,‘ V-introduction (cf. 1.4.5, 
p. 44) yields 


FA=B>(VWx)(xEAoxe B) 


(1) now follows by tautological implication (cf. 1.4.1, p. 43). 


We have said that urelements have no set-theoretic structure, that is, if b is 
an urelement, then the claim a € b is false for all possible meanings of a. This 
is formalized below. 


III.1.3 Axiom (Urelements are “Atomic’’). 
U(y) > 7(ax)x Ey 


The above says that urelements do not have any elements; however, that does 
not make them empty sets, for urelements are not sets. The content of ITI.1.3 
can also be written as 


U(y) > (Vx)>x € y 
or even 
U(y) > (Wx)x € y 


where “x ¢ y” is an informal (metamathematical) abbreviation of “ax € y”. © 


Iil.1.4 Remark. The contrapositive of III.1.3 is 
(ax)x € y > 7AU(y) 


that is, intuitively, “if y has any elements, then it is a set’. 
It is also useful to note the consequence 


xey> WU (y) 


of the above (substitution axiom x € y > (Ax)x € y and tautological impli- 
cation). 


III.1.5 Definition (Subsets, Supersets). We introduce to the formal language 
a new predicate of arity 2, denoted by “C’’, by the defining axiom 


ACBe(Vx\(xeA>x€B) (1) 


7 A and B are free variables distinct from x. 
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In English, in the case where A and B are sets, this says — that is, the semantics 
in the metatheory is — that A C B is a short name for the statement “every 
member of A is also a member of B”. 

We read “A C B” as “A is a subset of B”, or “B is a superset of A”. Instead 
of A C B we sometimes write B D A. As usual, we negate C and D by writing 
Z and Z respectively. 


IiI.1.6 Remark. In III.1.5 we chose to allow the symbol C to act on any objects 
of set theory — sets or atoms. An alternative approach that is often adopted in the 
literature on naive set theory is to make A C B undefined (or meaningless) if 
either A or B is an atom (this would be analogous to the situation in Euclidean 
geometry, where, for example, parallelism is undefined on, say, triangles or 
circles). Our choice in III.1.5, that is, (1), is technically more convenient, since 
it does not require us to know the exact nature of A or B before we can use the 
(formal) abbreviation A C B. 

We note that, according to III.1.5, x ¢ A— x eB is provable if A is an 
urelement (by III.1.3), that is, A C B is provable. Indeed, 


U(A) Fzec (Wx)-x € A 
by Axiom III.1.3 and modus ponens. Thus, 
U(A) Fzpc 7x € A (2) 


by specialization (cf. 1.4.6, p. 44). By tautological implication followed by 
generalization (1.4.8) we get what we want from (2): 


U(A) Fzec (Wx)(x € A> x € B) 
or, applying the deduction theorem (1.4.19), 
Fzpc U(A) > (Vx)(x € A> x € B) 


We use the provability symbol with a subscript, e.g., F.z, to indicate in 
which theory .7 (i.e., with what nonlogical axioms) we carry out the proof. In 
the simple proofs above we have used the subscript ZFC, but we only employed 
Axiom III.1.3. We will seldom indicate what subset of ZFC axioms we are using 
at any given moment, and whenever we do, we will normally do so in words 
rather than using some | -subscript different from ZFC. 


The reader will also note that “.4 7...” is the same as “7 +.4F...” 
or“F UL 4@}F...” 


118 III, The Axioms of Set Theory 


III.1.7 Example. Since x € a > x €a is a tautology (note the absence of sub- 
script on the F that follows), we have F x € a— x €a and hence 


Fk (Vx)(x €a>x €a) (x) 
by generalization. Thus, by III.1.5 and tautological implication, 
FaCa (2k) 


or any object is a subset of itself. 
We did not use a subscript (e.g., ZFC) on F immediately above because no 
ZFC axioms were used. 


We immediately infer from II.1.1 and HI.1.5 the following Proposition 
TIL.1.8. 


III.1.8 Proposition. For any two sets A and B, 
A=BesACBABCA 
holds, or, formally, 
Fzpc WU(A)A AU(B) > (A= BOACBABCA) 


An observation is in order in connection with the above: Logical connectives 
have lower priority than any other connectives, so that “A C BAB C A” 
means “(A C B)A(BC A)”. 


Proof. Invoking the deduction theorem (1.4.19), we prove instead 


1U(A), -U(B) Happ A= BO ACBABCA (1) 


We offer a calculational proof: 
ACBABCA 
°o (1.1.5 and the equivalence theorem (1.4.25, p. 52)) 
(Wx)xE€A>xe B)AWx)x € Box €A) 
o (v over A distributivity (Exercise I.23, p. 95)] 


(x(x eA > x BAW EB > xe A)) 


o (tautological equivalence and equivalence theorem) 
(WWx)x Ee Aoxe B) 
o (extensionality, plus assumptions for “—”; Leibniz axiom for “| 


A=B 
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It often happens that A C B, yet A ~ B, where “A 4 B” is (informal) short 
for “=A = B”. 


III.1.9 Definition (Proper Subsets). We introduce a new predicate symbol of 
arity 2, denoted by “C’”’, by the defining axiom 


ACBseACBAAA=B 


We read “A C B” as “A is a proper subset of B”. 


The reader will note that, stylistically, C and C parallel the symbols < and 
< (compare how a < b, for numbers, means a < b anda # b). It should be 
mentioned however that it is not uncommon in the literature (e.g., in Bourbaki 
(1966b), Shoenfield (1967)) to use C where we use C, and then to need to use 


¢ or even g to denote proper subset. 


III.2. Set Terms; Comprehension; Separation 


We now want to imitate the informal act of collecting into a set all objects x that 
satisfy (i.e., make true — in the standard model) a formula .4 [x]. We already 
saw that a careless approach here entails dangers (Section II.2). We revisit this 
issue again here, and then provide a formal fix. 

It is clear that the reasonable thing to do within the formal theory is to restrict 
attention to formulas .4 for which we can prove the existence of a set, say a, 
such that 


xe€ac.4[x] 


is provable, thus replacing truth (cf. 1.5.15 and also p. 111) by provability. We 
achieve this if we can prove 


k2rc y)(-U 0) A Wax € y  -4Ex)) (l) 


where we have taken the precaution that y is not free in. 4. Bourbaki (1966b) 
calls formulas such as .4 “collecting”. 


@As in loc. cit., we use the symbol 


Coll, 4 (2) 


+ Otherwise we would be attempting to “solve” for y in something like “x € y <> .4(x, y)”, which 
is not the same as collecting in a container called y all those “values” of x that make .4(x, z) 
“true” for an arbitrarily chosen value of the “parameter” z. Such rather obvious remarks will 
become sparser as we go along. 
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as an abbreviation of 
Gy)(-UG) A Way € y .4Ix)) 


Note that x is not a free variable in (1) (or in (2)), but it is, nevertheless, the 
free variable of interest in .4, the variable whose “values” (that “satisfy” .4) 
we are determined to collect. (2) says that indeed we can collect these “values” 
into a set. 


IiI.2.1 Example. We verify that if y is a set, then Coll,x € y. It will be best to 
give a terse annotated proof of the formal translation of the italicized statement, 
that is, 


Eze U(y) > Coll,.x € y (A) 


It is easier to tackle instead 


sU(y) Fare G2)(-U@) A Wx €z > x € y)) (B) 
where z is distinct from y and x: 
(1) =U(y) (siven) 
(2) xeyoxey (tautology, hence 


logical axiom) 


(3) (Wx\(xEeyoxey) (2) plus generalization 
(1.4.8, p. 45)) 
(4) sUQ)A Wx ey ox ey) ((), G) plus 


taut. implication) 
(5) -UQ)AWx)e%eEyoxey)> 
(22)( AUQAWI(KEZexE y)) (logical axiom) 


(6) (a2)(- UZAWX\(XEZOXE y)) (4), (5), and modus ponens) 
We also note that, by the deduction theorem, (B) yields 


kere -U(y) > G2)(-U@AWHHEzexEy) © 


which is an expanded notation for (A). Intuitively, all of (A), (B), and (C) say 
that if y is a set, then ZFC allows us to collect all the x for which x € y is true 
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into a set. This is hardly surprising at the intuitive level, since this collection 
that we form is just y — and we already know it is a set. Nevertheless, it is 
reassuring that we have had no bad surprises here. 


IiI.2.2 Example (Russell’s Paradox, Second Visit). In this example we go 

out of our way to show that Russell’s paradox can be argued within pure logic — 

in II.2.1 we appeared to be arguing within (informal) set theory — and that it 

has nothing to do with one’s position on the set-theoretic question “x € x?” 
We prove that if. 4(x, y) and. #(y) are any formulas,’ then 


b AEy)(40) A x4 y) & Ae, x) (i) 


We prove (i) by contradiction (1.4.21, p. 51) combined with proof by auxiliary 
constant (1.4.27, p. 53): 


(1) a (40) AWG, Yo 7 Ax, x) (added or “given") 


(2) Bic) A (Wx)-4(x, c) o A(x, x)) (added: cis anew constant) 
(3) (Wx). A(x, c) < A(x, x)) (2) plus 


tautological implication) 


4) ACOSAKEG® (3) plus specialization) 


The formula in line (4) is, or creates, a contradiction, since alsot .4(c, c) @ 
(Cc, C). 

Taking .#(y) to be the special case of ~U(y), and .4(x, y) to be x € y, 
we have refuted Coll,x ¢ x, that is, we have established that (note absence 
of subscript on F) F Coll,x ¢ x without ever using any nonlogical axioms 
or being aware of the question x € x. Our third (and last) visit to Russell’s 
paradox will show that what is at work here is a Cantor diagonalization. 


Thus, Frege’s (1893) axiom of comprehension that 
for all formulas .4, one has Coll,.4 


is refutable within pure logic by providing the counterexample.4 =x ¢ x. 


¥ You may want to look at [.1.11, p. 19. 
! The reader may recall that we have reserved “=” for string equality, not formula equivalence 
(see p. 13). 
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Russell’s paradox was not the first paradox discovered in naive set theory 
as originally developed by Georg Cantor. The Burali-Forti antinomy had been 
suggested earlier, and we will get to it at the proper place in our development. 
Russell’s paradox is less technical on the one hand, and is immediately rele- 
vant to our present discussion on the other hand; thus we opted for its early 
presentation. 

Obviously, comprehension, as stated by Frege, was too strong and allowed 
some “super-collections” to be built (like R) which are not sets. 

It should be noted here that in the work of Cantor the comprehension schema 
was used carefully so as not to construct “too large” or “too complicated” sets, 
as compared with the “ingredient” sets that entered into such constructions. For 
this reason, his work did not explicitly lead to Russell’s objection, the latter 
being aimed at Frege. 

We still want to be able to collect into a set all “x-values”’ that satisfy “rea- 
sonable” formulas .4— leaving the unreasonable ones out. Let us work towards 
identifying such reasonable formulas. But first a lemma: 


TI.2.3 Lemma. 


Fgrc TU(Y) A (WxX)(X Ey @.BATU(ZAWX)(XEZO. B@)> y=Zz 


Proof. 


AU(V) A Wx)x Ey & BAU (Zz) A Wx)x € 20.4) 
o (v over A distributivity, taut. equivalence and 1.4.25, p. 52| 
U(y) A U2) A (¥x)(( eyo. #AxEze 4)) 
> (v-monotonicity (1.4.24, p. 52) and taut. implication) 
AU(y) A AU(Z) A (Wx)(x EV oO xX EZ) 
> (extensionality — only this step used a ZFC axiom) 


yous 


(Recall the discussion in I.7.11) 


TI.2.4 Remark. Let us recall the basics of introducing new function symbols 
(Section I.6). Suppose that we have the following: 


Ly (Ay). A(y, Xn) (1) 
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and 
Lz Ay, Xn) A (2,0) > Y= 2 (2) 


Then we may introduce a new function symbol, say f z, into the language of 
ZT by the axiom 


fARn) = ¥ > AY, Xn) (3) 


We also know that (3) is provably equivalent to (4) below (see p. 73), so that (4) 
could serve as well as the introducing axiom: 


Af ain), Xn) (4) 


We finally recall (p. 73) the notation (Whitehead and Russell (1912)) for the 
term f Xn): 


(ty). AY, Xn) (5) 


A special case of the above is important. Suppose that f(x,) is a term, where we 
have written “(x,,)” to indicate the totality of free variables int. Then substitution 
in the logical axiom x = x yields t = f; thus the substitution axiom and modus 
ponens yield 


r (Ay)y=t (6) 


Note the absence of subscript from F above. Since equality is transitive, we 
also have 


Ly=tAz=t>oy=z (7) 


We may thus introduce a new function symbol f; of arity m > n by the axiom 
(form (3) above) 


f:Om)=yortsy (8) 
or equivalently (form (4) above) 
fim) =t (9) 


where the list y,, contains all the variables x, of t. 


An important, more general case of (1)—(2) often occurs in practice. We may 
have a proof of (1) for some, but not all, X,: 


br Dan) > Ay). 409. n) a 
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We assume that we still have (2) in the restricted form 
be D(Xn) > Gy, Xn) A 4(Z, Xn) > VY =Z (11) 


We would like now to introduce a function fz that satisfies (3) (or (4)) for 
precisely those x, that satisfy Y. We could define f arbitrarily for those x, for 
which & fails. 

Let then a be some constant in the language of .7. Let 


BY, Xn) = D(%n) A AY, Xn) V WL An) AY = a (12) 


We show that 


We will employ the deduction theorem: 
(i) DBn) A By, dn) V ~DGn) Ay =a (assume) 
(ii) Dn) A A, Fn) V ~POn)Az=a (assume) 


(iii) D(X) A. 40, Xn) A. 4(Z, Xn) VAD (Xn) Ay =aAZ=a 

((i), Gi) plus Fiat) 
(iv) y=2z 
(proof by cases: Ist disjunct of (di) plus (11); 2nd disjunct plus trans. of =) 


We next note that 


br Ay). By, kn) (14) 
Indeed, 
(i) DE) (assume) 
Gi) (Ay)4(y, Xn) (, (10) and MP ) 
(iii) A(c, Xn) assume; c is anew 
constant) 
(iv) Dn) A. AC, ¥n) V WPn) Ac = a (, (iii) plus tant) 


(v) (Ay)(FGn) A .A(y, Xn) VAZ(Xn)A Y= a) ((iv), subst. axiom 
plus mp) 


By the deduction theorem 


br DEn) > ABW, Fn) oD 
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Next consider 


(i) AD (Xp) (assume) 
(ii) ~DR,) Aa=a (, tL a=aand Etat) 
(iii) DB_) A 4a, ¥,) V WADG,) Na =a (a) plus - taut) 
(iv) Ay)(F Gn) A. 40, Fn) V AD Gn) A y = a) (Gia, subst. axiom 
plus mp) 
Thus, by the deduction theorem, 
bx ~D(Xn) > (Ay). BY, Xn) (16) 


(15) and (16) yield (14) via proof by cases. (13) and (14) allow the introduction 
of fz by the axiom 


Bf 4Xn), Xn) (17) 
That is, 
Dn) \ AF 4Gn)s Xn) V 7D En) A f2Gn) =a (18) 
Since 
D (Xn), (18) Fraut 4 (f 2Gn)s Xn) 
and 
7D (Xn), (18) Fraut f2%n) = a 
we get 
(18) kx Dn) > Of. 2%n), Xn) (19) 
and 
(18) kz ~LOn) > f2Gn) =a (20) 


In other words, (10) and (11) allow us to introduce a new function symbol fz 
that satisfies (19) and (20). (20) defines f 4 “arbitrarily” for those x, where Y 
fails. 

It is easy to check, just as we did on p. 73, that (19) is provably equivalent to 


(18) 7 DGn) > (40. 5n) & ¥ = FG) (19) 
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III.2.5 Definition (Set Terms). If + zp¢ Coll,.F(x, Z,), then Lemma III.2.3 
allows us to introduce the term 


(y)(-U0) A (Wx) (x € ¥ + F(x, 20) (st) 


We call the above a set term, defined by the formula .Y and the objects 
Kili, Sse Ly 

We (almost always) use the shorter, and standard, metamathematical abbre- 
viation 


{x 1. F(x, Zn)} (sst) 


instead of the notation (st). 


The reader will recall from Section I.6 that, in actual fact, a formal defini- 
tion introduces a function symbol, not a term. However, we agree to leave the 
“ontology” of that function symbol, say, “fz”, unspecified, and we agree to 
use the argot (st) or (sst) above to informally denote the term, f7 (Zn), that 
corresponds to fz. 


Nevertheless, whenever | zrc Coll,.¥, either of the notations (st) or (sst) 
stands for (i.e., names) a formal term of the theory. 


It is important to note that set terms give rise to more complicated terms than 
just variables. The latter are the only terms of the basic language Ler (see II.3.2), 
while as we enrich the language by the (formal) addition of new function sym- 
bols fz, fy, etc., and constants @, w, etc., we can build complicated terms 
suchas f 4(..., fa(...,@,...),@,...) (see 1.1.5). Such terms we will call just 
terms (or “formal terms’, to occasionally emphasize their formal status). © 


We immediately have 


III.2.6 Proposition (Set Term Facts). Jf zc Coll,.F, then: 


(i) Here y = (8. F} & UY) AW Ey oF). 

(Gi) Bee su (tx : FY). 

(iii) Fope x E{x: FJ oF. 

(iv) If also + zpc Coll, Y, then zee (Wx)F > FH) <> {x2 F}C {xs F}. 
(v) If also Fzpc Coll, FY, then zp (Wx)VF oO YF) <o {x F}= {xs F}. 


Proof. (i): This is (3) in III.2.4 above, that is, the introductory axiom for “f.7’, 
where .4 is “AU(y) A (Wx)x Ey oF)’. 
(ii): By (4) in II.2.4 and tautological implication. 


II.2. Set Terms; Comprehension; Separation 127 


(iii): By (4) in HI.2.4 and tautological implication followed by specializa- 
tion. 
(iv): By (iii) and the equivalence theorem, 


Fgpc (Vx)F > YF) <> (Wx)x € {x :.F}oxe{x: F}) 


Note that the assumption zc Coll, ¥ allows us to introduce {x :.#} form- 
ally and have (iii) (with.F replaced by 7). 
(v): Similar to (iv). 


In Section HI.4 we will introduce informal notation that allows us to write 
(i)-(v) above in the metatheory without requiring prior proofs of either Coll,.F 
or Coll, F. 


III.2.7 Remark. Note that x is a bound variable in (st) of Definition III.2.5, and 
hence also in (sst). Thus, if the conditions for the variant theorem are fulfilled 
(1.4.13, p. 46) — that w occurs neither free nor bound in.¥ — then we can also 
write the set term as {w :.%(w, Z,)}. That is, 


Ezpe {x : F(x, Zn)} = {ws Fw, Zn)} (1) 
The above is different from (v) of III.2.6. It can be proved as follows: 


y = {xs F(x, Zn)} 
< ((i) of 1112.6) 
Uy) A (Wx) € y > F(X, Zn)) 
°o (variant theorem (1.4.13, p. 46) and equivalence theorem) 
AU(y) A Vw)(w € y & .F(w, Zn)) 
< (i) of 1112.6) 
y={w: Fw, Z)} 


Thus, 
Hzrc y = {x : F(x, Zm)} oy ={w: F(w, Z)} 


from which, substitution and the logical fact + t = t for any term f¢ yield (1). 
As a corollary we have — via the equivalence theorem and (1) — the well- 
known and obvious (under the usual non-occurrence restrictions on w) 


rere w € {x 2. F[x]} > F[w] (2) 


© 


128 III, The Axioms of Set Theory 


Formally introduced set terms play a dual role. On one hand, formally, they are 
just meaningless symbol sequences of which we have proved (or a proof exists, 


in any case) that they are sets. For that reason, we often just say “...the set 
{x :.F}...". 


On the other hand, the formula part of a set term first order defines (in the 
standard structure) some real set; hence the term itself represents or names that 
set. 

The very format of the chosen symbol for set terms, 


{x :.F} 


is suggestive of its semantics in the standard model: “the collection of all the x 
that make .¥ true”. As a matter of fact, this is more than notational suggestive- 
ness: Soundness of all first order theories — and anticipating that our axioms will 
be true in the standard model — implies that all ZFC theorems will be “really 
true”. In particular, the formula in (ci7) of III.2.6 is “true” and says that “x is in 
{x :.F [x]} iff F [x] is ‘true’ for this x”. © 


IiI.2.8 Example. We continue here what we have started in Example III.2.1. 
Since 


FE zpc U(y) > Coll,.x € y 


III.2.6 (ii) gives 


“WU (y)Farcxe {xi xeypoxey 
By III.2.6 (ii), 
=U (y) Fzec -U (tx 1x € y}) 
as well; hence 
7U(y) Fare y = {x sx € y} (1) 


by extensionality via substitution. 
In words, every set is equal to a set term. © 


We now introduce a weak form of Frege comprehension, so that we can have 
a sufficient condition for Coll,..4 to hold. 


III.2.9 Axiom (Schema: Separation or “Subsets” Axioms). For every for- 
mula P[x] which does not have any free occurrences of y, the following 
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is an axiom: 
=aU(A) > (ay)(-U(y) A(Wx)\xEeyoxedA AP |x])) 
For short, 
=U(A) > Coll,(x € AA P{x]) 
The above is a schema due to the presence of the arbitrary formula 7. Every 
specific choice of F leads to an axiom: An instance of the axiom schema. The 


name “separation axiom” is apt since the axiom allows us to separate members 
from non-members (of a set). 


Why is the schema III.2.9 true (in the standard model)? Well, it says that if A 
is a set, then — no matter what formula Y we choose — we can also collect 


all those x that make x € A A [x] true (1) 


into a set. 

Now, all those x in (1) are in A, and we know that we have formed A at 
some stage’ (it is a set!), say &, that comes after all the stages at which all the 
various x in A were formed (or “given’’, if atomic). 

Thus, at this very same stage & we can collect into a set just those x in A 
that are moreover restricted to satisfy P [x]. © 


III.2.10 Definition. Whenever the set term 
{x:xE AAFP} 


can be introduced by III.2.9, it is often written more simply as 


{xEA:P} 


II.2.11 Proposition. 
U(a),P > x €akerc @y(-Vvo AWx\(x eyo P)) 
In words, if a is a set and we know that  —> x € a, then 
Coll, 


so we can introduce the set (term) {x : P}. 


¥ Principle 1, p. 102, is at work here. Recall that we can take or leave this principle. However, 
we have decided to take it (and hence adopt foundation, later on). It is worth stating that in the 
absence of Principle 1, a “doctrine on limitation of size” would still effectively argue the “truth” 
of separation. 
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Proof. 
@y(-v0) A(Wx\(xEyoxea AP)) 


o (equivalence theorem and.4 > .7 Eqaut.42 << .4@A.B 
y)(-U0) A Way € y + P)) 


Done, since the top line is provable in the presence of the assumption —U (a) 
(separation). 


III.2.12 Corollary. 
Ee HUGS WO Sea @y(-V0 AWx\x eyo P)) 


Proof. By the deduction theorem and III.2.11. 


So we can build sets by separation by restricting membership to existing 
sets. Unsatisfactory as this may be — since separation only enables us to build 
“smaller” sets (meaning here subsets) than the ones we have — it gets worse: 
We have no proof that any set exists yet. We fix this in the next and following 
sections. One should note that 


(Ax)x =x 


is a theorem of pure logic (axiom x = x followed by the substitution axiom 
and modus ponens). This says, as far as ZFC is concerned, 


An object exists! (*) 


But what type of object? This may well be an atom, so we still have no proof 
that any set exists. 


III.3. The Set of All Urelements; the Empty Set 
W.3.1 Axiom. The set of all urelements exists: 
Coll,U(x) 
II.3.2 Definition. We introduce a new constant, M, into our formal language 
by the axiom 
y=Mo-rU(y)A (WX) € y > U(X) (1) 
+ Upon reflection, there is nothing unsettling about pure logic proving that “objects” exist in set 


theory. This is simply a consequence of our decision — in logic — not to allow empty structures. 
This decision was also hardwired in the syntactic apparatus of logic. 
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since 


kzrc (@y)(=U(») A (Wa y(x € y + UC) 


by III.3.1. 


That is, we have just introduced the (rather unimaginative) short name M for 
the set term 


{x : U(x)} 
since (1) above yields 
Frc y= Mo y= {x: U(x)} 
and hence, by substitution, 
Frc M = {x : U(x)} © 


III.3.3 Lemma (Existence of the Empty Set). -zpc Coll,,7x = x. 


Proof. By ~=x = x > x € M andl zpc -U(M) (the latter by HI.3.2) 
plus II.2.12. 


III.3.4 Definition (Empty Set). By III.3.3 we may introduce the set term 
{x : 7x = x} 
for the empty set. We can then follow this up by the axiom (definition) 
O={x: 7x =x} 


to introduce (using (9) in III.2.4) the new constant symbol % for the empty set. 


I1I.3.5 Remark. Referring to 2.6 (ii) and (iii), we see that the intuitive 
meaning — or “standard semantics” — of Y is the “set with no elements”, since 
it is a set, but, moreover, x € @ is “false” (equivalent to —x = x) for all x. And 
this is as we hoped it would be (refer also to the discussion in Section II.4). 
Syntactically we get the same understanding as follows: By II.2.6 and IIL.3.4, 


boc x EW<a 7x =x 
Hence, by tautological implication, 


foe 7X EDS xX =X 
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Therefore, by the equality axiom x = x and tautological implication, 
Fzpc 73x ED 

or 
Fzpc x ED 


A by-product of the existence of the empty set is a relaxing of the conditions 
in II.2.11 and III.2.12. We may drop the assumption -U(a). Assume Y > 
x € a. Now there are two cases (proof by cases). If ~U(a), then we let HI.2.11 
or [II.2.12 do their thing. If U(a) is the case, then we infer —x € a (III.1.3); 
hence x € a — x € & by tautological implication. Another tautological 
implication gives Y > x € J. Since -zpc =U (G), we can now invoke II.2.11 

to infer Coll. © 


The concluding remark above is worth immortalizing: 


III.3.6 Proposition. pc (Vx)(P > x € a) > Coll, S. 
Correspondingly, P > x € a gc Coll, S 


II.3.7 Example. We saw how to justify the existence (as a formal mathematical 
object — a set) of a “part of” M in the simple, but very important, case of 9. 
In general, III.2.12 allows us to prove Coll,.4 for any .4 for which we know 
that. 4 — x € M (either as an assumption, or as a provable ZFC fact). 

For example, we can show that for any a and b in M we can collect these two 
elements into a set of “two” elements, intuitively denoted by “{a, b}”. Indeed, 


FaeEMAbEM>x=avx=b>xEM (1) 
In fact, 
FaeMo>x=a>xeM (2) 
and 
fFbheM>x=b>xeM (3) 


by the Leibniz axiom. Thus, proof by cases (1.4.26, p. 52) helped by Eqaut gives 


Fxe=aVvx=bo> (aeMo>xeMv(ibeM>xeM) 


of which (1) is a tautological consequence. 
Thus, 


if ae MandbeM, then Coll,(x =aVvx=b) 
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or, in the formal language (with the “Coll” abbreviation), 
Fyzpcd E MADEM = Coll.(x =avx =b) (4) 
One introduces as usual the set term 
{x:x=aVx=b} 
as a follow-up to (4), and also the new symbol (“set term by listing”’) 
{a, b} 


1.e., one defines 


{a,b}={x:x=aVvVx=b} 


Can we repeat the above for any sets a and b? That is, is it true that Fzpc 
Coll,(x = aV x = b) for any objects a and b? In particular, can we say that we 
can form the (real) sets {{a}} or {{a}, {b}} in the metatheory? Well, we should 
hope so, since — intuitively — there is a stage after the stages when {a} and {b} 
were built. 

However we need a new axiom to formally guarantee this, because all our 
present axioms are true in the structure with underlying set {@, 1, {1}} (M = {1} 
here), but so is 


(¥y)(-U(Y) > (ayer € y > UC) 
since the members of every set in this structure are atoms. Thus, 
present set of axioms + “no set has set elements” (1) 
is consistent (cf. 1.5.14). Hence 
present set of axioms |/ “some set has set elements” (2) 


Thus, by (2), we cannot prove yer that, in particular, a set {{a}} exists. © 


One last comment before we leave this section: We choose not to postulate 
existence of individual urelements, so it may be the case that M = J. This 
leaves our options open in the sense that we can have the “usual” ZFC (with 
no urelements) as a special case of our ZFC. We note in this connection that if, 
instead of having a predicate (U) to separate sets from atoms, we adopted a two- 
sorted language with two “types” of object variables one, saya, b,c, a’, a”,..., 
for sets and one, say, p,q, p’,q",..., for atoms, then 


F (Ap)\(p = p) 


would guarantee the existence of atoms, spoiling our present flexibility. 
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Il.4. Class Terms and Classes 


Before moving on towards developing tools for building more complicated 
sets, we pause to expand our argot notation in the interest of achieving more 
flexibility. 

ZFC is about sets and atoms. It does not deal with “higher order” objects such 
as the Russell collection (which, we have seen, is not a set), and, moreover, its 
(formal) language has no means of notation for such higher order collections. 
Nevertheless, much is to be gained in notational uniformity, and hence also in 
user-friendliness, by allowing in the metalanguage the use of symbol sequences 
of the form {x :.4}-— called “class terms” — even if we have no knowledge of 


ALG Coll, 4 


Indeed, we want to be able to use in formal (syntactic) contexts the “term” 
{x : .4}, even if the above may actually fail. Correspondingly, in semantic 
contexts, the symbol sequence {x : .4} serves as aname for a real collection A — 
that is probably too big to be a set — which . 4 first order defines in the usual 
sense:! 


x €A iff  4[x] is true in the standard model of ZFC 


The collection that A names is technically called a class (cf. [I.4.3). We, of 
course, simply say “A is a class”. 

To protect the innocent I state outright that there is no philosophical signi- 
ficance in restricting attention to first order definable classes. It is not due to a 
lack of belief in the existence of non-definable classes; rather it is due to a lack 
of interest in them. © 


While the intended semantics above is meant to motivate the consideration 
of (possibly non-set) classes, “real classes” do not intrude into our (argot) 
usage of class terms. The latter are employed entirely syntactically. Their use 
is governed by a “calculus of translations” through which we may introduce or 
remove class term abbreviations: 


11.4.1 Informal Definition (Class Terms). For any formula .4 of ZFC, the 
symbol sequence 


{x4} (1) 


is called a class term. 


¥ In 1.5.15 we saw what it means to first order define a set in a structure. The notion naturally 
extends to first order definability of any collection. 
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We hereby expand the role of this symbol, employing it in the metalanguage 
for two purposes: 


(a) If we can show 
ALG Coll,.4 


then we use (1) to name a (formal) set term as per II1.2.5 — thus, every set term 
is also a class term. 

(b) If not, then (1) can still be employed as an abbreviation of certain formal 
texts described below (compare with II.2.6): 


(i) y = {x :.F} and {x :.F} = y each stand for the formal text 
AU(y) A (Wx)x Ey oF) 


In particular, this reflects the position that a (formal) variable, like y, 
stands for an atom or set (here, a set). 


© “="in y = {x :.F} is not the formal “=”. We are not to parse the informal text 

“y = {x :.F}’, decomposing it into its ingredients. We take it in its entirety as 
an alias for the formal text “-~U(y) A (Vx)(x € y <.#)’. A similar comment 
applies to informal uses of “=”, “e”, and “U” below. © 


(ii) {x :.F} ={x : H} stands for the formula (Vx)(¥ <— Y). 

(iii) x € {x : .F[x]} stands for the formula .7[x], and (see III.2.7) x € 
{w:.F%[w]} stands for the formula .7[x] (where w is neither free nor 
bound in.¥ [x]). 

(iv) {x : F} € {x : F} stands for 


@y)(y =f FLD Ay € (: Ad) 
which (with the help of (i) and (ii)) becomes 
@y)(-U0) A WH € y + FLD) A ALI) 
(v) {x :.F} € z stands for 
Gy)(y = («FLD Ay ez) 
which (with the help of (i)) becomes 


y)(-UO) A Way € yo FLED Ay €z) 


(vi) U(tx : F}) stands for (Vx)—-x = x. 
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Pause. So U ({x 1F }) is refutable. Does this prove that {x :.F} is a set? 


Til.4.2 Remark. (1) Ideally, we should have different notations for the symbol 
{x :.¥ } according to its status as aname fora set term or not — say, boldface type 
in the former case, and lightface in the latter. However, it is typographically more 
expedient to use no such notational distinctions but allow instead the context 
(established by English text surrounding the use of such terms) to fend off 
ambiguities. 

(2) We already know that for some formulas .4, Fzpc —Coll,,.4. Seman- 
tically, for such a formula . 4, the collection in the metatheory named by the 
symbol 


{x :.4[x]} (*) 


is not a set. 
Indeed, using III.4.1(7) above, we translate the formal “t-zpc —Coll,4” 
into the theorem, written in English, 


“There is no set y such that y = {x :.4}” (#*) 


Then, Platonistically, for such a formula .4 we know that the collection («) is 
not a set in the metatheory, since the theorems — such as («) above — of the 
formal theory are true in the standard model. 

For example, we can state that “{x : x ¢ x} is not a set in the metatheory”. 
The quoted fact is the translation of our formal knowledge that “There is no set 
y such that y = {x : x ¢ x}’, or in full formal armort 


b 9@y)(-UQ) A Wx) € y ox ¢ x) 


For the semantic and informal side of things and for future reference we 
state: 


III.4.3 Informal Definition (Real Classes). A (real) class is a collection that is 
first order definable in the standard structure (in the language of ZFC). Specif- 
ically, the class term 


{x A(x, Z1,---5 Zn} (1) 
names a real collection, also denoted by A(z),..., Zn), that is first order defined 
by the formula .4(x, z1,...,Z,). That is, for any choice of values for the 


+ The reader will recall that this is a fact of logic, whence just “EH”. 
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parameters Z1,.-.+, Zn, 
x €A(Z,...,2Zn) iff A(x, Z,..., Zn) is true 
If, for some choice of closed terms f,, Kzpc C oll, A(x, t,...,t), then A(fy) 


denotes a real set; otherwise it denotes a non-set class, called a proper class. 


For the sake of convenience we will use “blackboard bold” capital letters as 
short names of classes; e.g., A abbreviates the class term {x :.4} and we may 
write, using the metalinguistic “=”, 


A= {x :.4} 


These names are metavariables.t We will normally adopt the general convention 
of naming a class term by the blackboard bold version of the same letter that 
denotes the defining formula. 


For example, A=Bis short for {x:.4} = {x:.@}, A € B is short for 
{x :.4} € {x :.B}, etc. — expressions which can be translated into the formal 
language using III.4.1. 


Iil.4.4 Remark. (1) Worth repeating: Class terms are just symbols that name 
certain entities of our intuition, namely, classes. We will often abuse terminology 
and say “let {x : .4} be a class” rather than “let {x : ~4} name a class’, just 
as one may say (under the assurance of t-zpc Coll,. 4) “let {x :.4} bea set”. 
Properly speaking, a class term is an syntactic object, while a class is a “real” 
object. 

(2) What class terms and classes do for us is analogous to what number 
theory argot does for us in Peano arithmetic (PA). Such argot allows us to 
write, e.g., the easily understandable informal text 


pa every n > | has a prime divisor 
instead of 
ee (¥n)(n > 1 (axG@y)(n=xx yA 


x > LAY =m xr m= 1Vm=3))) 


+ We note that N, Z, Q, R, C are already reserved for the natural numbers, integers, rational num- 
bers, reals, and complex numbers. These are metalinguistic constants. Besides, we have al- 
ready called the Russell proper class “R”, and later we will use “On” and “Cn” for certain 
proper classes. This does not conflict with the blackboard bold notation for indeterminate class 
names. 
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In particular, in the context of class terms one can readily replace “stands 
for” by “<>” to write — for example — something like (cf. II.4.1(@)) 


Fy={x: Flo rUQaAWxxEeyoF) (x) 
We can obtain (an absolute) proof of («) by starting with the tautology 
AU(y)AWx)x Ey &.F) ao AU(y)AWx)x Ey oF) 


and then abbreviating the left hand side, “=U(y) A (Vx)(xEyo.F)”, by 
“y = {x :.F}” using the translation rule II.4.1(i). 
(3) Every “real” set named by some formal term f is a class, since (by III.2.8) 


y={x:ixey} 
and hence 


t={x:xet} 


by substitution.‘ 


TII.4.5 Example. 
(a) 
y=A (1) 
(or A = y) is very short text for y = {x : .4}, which in turn is short for 
(1I.4.1(2)) 
TU (y) A (Wx)(x Ey > 4) (2) 


Thus, whenever we claim that we can prove (1), we really mean that we 

can prove (2). In particular, such a proof yields also a proof that 

(i) Coll, (by substitution axiom and modus ponens); hence {x :.4} is 
(i.e., can be introduced as) a set term; thus, A is (denotes) a set. 


Pause. What is all this roundabout argument for? Why don’t we just 
say, “A, a class, equals a set y.t Therefore, it is itself a set’? 


(ii) xe yo. 
(b) 


+ Without loss of generality, x is not free in ¢. 
= Recall the convention on variable names. y names a set or atom, but it is not an atom here. 


Il.4. Class Terms and Classes 


is very short for 
{x :.4} €{x:.B[x]} 
which is short for (III.4.1(@v)) 


y)(-U0) A (Wax € y &.4)A Aly) 
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(4) 


(5) 


Now, to say that we have a proof of (3) (or that we assume (3)) is to say that 
we have a proof of (5) (or that we assume (5)). From the latter, tautological 


implication along with 4-monotonicity (1.4.23) yields 


Gy)(-U0) A Way € y .4)) 


that is, Coll,.4. In words, “(3) implies that A is a set”. This corresponds 
well with our intention that class members are sets or atoms. Here A, being 


a collection, is not an atom. 


11.4.6 Example. By III.4.1(i, ii), if y = A then A = y, andif A = 
= A.i Moreover, A = A is a theorem of logic, since .4 @ 4. 


then 


Transitivity of this (informal) class equality is also guaranteed, from 


4228, B <> 6 Fra 4 << € and Definition II.4.1(i). 


III.4.7 Informal Definition (Subclass and Superclass of a Class). The nota- 


tion A C B stands for 


(VWx)(x € A> x € B) 


and is pronounced “A is a subclass of B”, or “B is a superclass of A”. We can 


also write B D A. 


A Cc B(alsoB 3 A) stands for A C BA =A = B and is read “A is a proper* 


subclass of B” (also “B is a proper superclass of A’’). 
If A C B and A is a set we say, as before, that A is a subset of 


We have at once: 


III.4.8 Proposition. 


G)FACBs (Wx).4 >.) 
Gi) -FACBABCASA= 


1 Indeed, + A = B + B = A translates to the formal (4 @ .2) o (Bo .4). 
! This “proper” qualifies “subclass”, not “class”. Thus a proper subclass could still be a set. 


140 III, The Axioms of Set Theory 


Proof. (i): ACB (Vx)\(x € A> x €B) is a tautology “Y & Y” by IIL4.7. 
We use III.4.1 to eliminate “x € ---”; thus ACB < (Vx)(-4 > .%). 
(ii): We translate (i7) using (7) that we have just verified, and III.4.1: 


(Wx)(4 > BA Wx) PB > 4) & (Wx). 4 & PB) 


Distribution of V over A to the left of the first <, along with the tautology 
theorem and equivalence theorem, shows that the above is a theorem of pure 
logic. 


Pause. So, does (ii) above prove extensionality? 
III.4.9 Example. What can we learn from zpc -U(y)AA C y? Well, IIL.2.8, 
III.4.7 and III.4.8 allow us to translate the above into 
Fzpc TU(y) A (Wx)(-4 > x € y) 
By IJI.2.12 and modus ponens we get 
Eze Coll. 4 
That is, A is a set. Another way to say all this is 


Fzrc 7U(y) AAC y > Coll,4 © 


The above is worth immortalizing, in English: 


III.4.10 Proposition (Class Form of Separation). Any subclass of a set is a 
set. 


IiI.4.11 Example. We translate the very common informal text “A ~ @”: first, 
into =({x :.4} = 9), and next, taking Fzpc @ = {x : —x = x} into account, 


(Wx).4 << 7x = Xx) 
that is, 
(Ax)7(4 << 7x = x) (1) 
But 
FE taut (CoP) o (Co 77) 
Thus (1) is provably (within pure logic) equivalent to 
(Ax). 4 <> x = x) (2) 


by the equivalence theorem. 
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Since x = x is a logical axiom, (2) is provably (within pure logic) equiv- 
alent to 


(Ax).4 


This is the translation of A 4 @. Correspondingly, A = 9 translates to -(Ax). 4. 

A class that equals @ is called, of course, empty. However, many authors also 
use the term void, or null, class if A = @, and correspondingly, non-void, or 
non-null, if A #9. 


I1I.4.12 Informal Definition (Class Union, Intersection, and Difference). 
We introduce the following metalinguistic abbreviations: 


(a) AUB, pronounced the union of A and B, abbreviates {x : x € AV x € B}. 

(b) ANB, pronounced the intersection of A and B, abbreviates {x:x € AA 
x € B}. If ANB = Q, then we say that A and B are disjoint. 

(c) A — B, pronounced the difference of A and B in that order, abbreviates 
{x: x € AAx ¢ B}. 


<roms authors use “~” or even “\” for difference instead of “—”. 


111.4.13 Example. If A — B = B — A. Indeed, if a ¥ b, 


E{x:x=a}—{x:x=aVvx=b}=96 


while 


F{x:x=aVx=b}-—{x:x=a}={x:x=b)} 


It is immediate that 


III.4.14 Proposition. /f A or B is a set, then so is ACB. If A is a set, then so 
is A — 


Proof. By II.4.10 and (1)-(3) below: 


(1) KANBCA 
(2) KANBC 
(3) KA-BCA 


To see why (1) holds, we eliminate class terms: 


(Wx) 4 A.B > 4) 


142 III, The Axioms of Set Theory 


The above is provable in pure logic. Similarly for (2). Also, (3) translates to 


Fk (Wx).4 AAB > 4) 


Associativity of each of U and MN (Exercises III.19 and III.20) allows one to 
omit brackets and write “A 1 BM C” or, by recursion on n € N,i 


TII.4.15 Informal Definition. 


; stands for A, 


tiC- 
S > 


stands for (U Ay) U Ans 


i=1 


ll 
as 


and 


; stands for A, 


2iD- 
a 


ts 


stands for (a Ay) N Ans 


i=1 


ll 
fe 


The symbols “(_)/_,” and “()j_,” are also written as “LU, <<,” and “(),<;<,” 
respectively. 

In a moment of weakness one may find oneself writing “A; U--- UA,” and 
“A; M---NA,” respectively. 


III.4.16 Remark (Formal Q and Difference). We cannot prove similar results 
(to those contained in III.4.14) for union yet. We will have to wait for the axiom 
of union. 

Note that in a classless approach one could carry out Definition II.4.12 and 
Proposition III.4.14 as follows: 

For the definition, for example of “NM”, one would introduce a new 2-ary 
(binary) function symbol, “NM”, formally by the defining axiom 


xNy=zo UZ) A Www ezowexAwey) @) 


(i) is legitimate because Fzpc Coll,,(w Ex Awe y). Indeed, 


Fru wexAweyrowex (ii) 


¥ Recall that definitions, both formal and informal ones, are effected in the metatheory, where we 
have tools such as natural numbers, induction, and recursion over N. 
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Thus (by III.3.6) 
zc Coll,(w Ex Aw €y) (iii) 


We note that x M y makes sense formally even when one or both of x and y 
are atoms, whereas the informal N was defined only for classes.' The defining 
axiom (i) and III.1.3 prove 


Fzpc U(x) > xN y= 
and 
Fzpc U(y) > xN y= 
Similarly, difference would be introduced formally by 
xX-y=zo W(zZ)A Vw\wezowexAnwey) (v) 


The proof of the legitimacy of (v) is left to the reader. Note that here too 
x — y, formally, makes sense for atom “arguments”. Moreover, we have the 
“pathological” special cases 


Fzpc U(x) > x -y=O 
and 
Fzpc U(x) > U(y) > x -y=x 


So much for the formal “Nn” and “—”. Of course, the defining axiom for the 
formal “U” will still have to wait for the union axiom. 

Since we would have sooner or later to extend the formal “N’, “U”, “—” 
(recorded below) to the class versions to use in our argot, we decided to in- 
troduce these symbols as (informal) abbreviations to begin with (as is done in, 


e.g., Levy (1979)). 


II.4.17 Definition. For the record, we introduce the 2-ary function symbols 
“M” and “—” formally to our language, and add the defining axioms (i) and (v) 
of III.4.16 to our theory. 

The context will easily tell in each use whether we are employing these 
symbols formally, or as abbreviations as per III.4.12. 


i Tt is normal practice in a first order language to insist that function symbols stand for totally 
defined or total functions, upon interpretation. Thus it is appropriate that M and — are defined on 
all objects. 
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III.4.18 Informal Definition (the Universe; the Universe of All Sets). We 
introduce the following abbreviations for the class terms {x:x =x} and 
{x :7U(x)}: 


Um = {x :x =x} 
and 
Vu = {x : 7U(x)} 


In the “real” sense, i.e., semantically, Uy is the universe of all objects set theory 
is about, while Vj is the universe of all sets, i.e., the atoms are not included in 
Vw (they are used however to build sets). 


The following is immediate. 


III.4.19 Proposition. 


(1) Uw is a proper class. 
(2) Fzec Uw = Vu UM. 


Proof. (1): Indeed, R, the Russell class, satisfies / R C Uy, since x ¢ x > 
x = x. If Uy were a set, then so would R be by III.4.10. 
(2): Uy = Vy U M translates to (by II.3.2, I.4.1(i7) and II.4.12) 


x=x< 7U (x) Vv U(x) 


Once we have the union axiom (which says that the union of two sets is a set), 
we will obtain that Vy is a proper class too (by (2) above and III.3.1).t 


III.4.20 Remark (Alternate Universes). (1) The symbol Uy will, in general, 
denote the class of all sets and atoms built from the arbitrarily chosen initial set 
of atoms N C M.If N = G, then we simply write U rather than Ug. The reader 
will note that while Uy (where M is the set of all urelements) is trivially given 
by the class term {x:x = x}, it takes, a glimpse forward (to Chapter VI) to 
show that Uy can also be defined by a class term — as we require for all classes — 
for any N C M. One way to do this is using the support function, “sp” 
(see VI.2.34), namely Uy = {x:sp(x) C N}. The latter says, “‘an object is 
in Uy iff when we disassemble it all the way down to its constituent urele- 
ments, all these urelements are in N”’. 


+ Note that if M #@ then M C R; thus R Z Vy. 
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Similarly, we can define the class of all sets whose construction is based on 
the urelements in N C M, Vy. We write just V for Vg. We note that Vy is 
given by the class term {x : ~>U(x) A sp(x) C N}. 

(2) In many elementary developments of the subject one often works within 
a “reference set”, or “relative universe”, X (a set), and the sets of interest are 
subsets or members of X. With this understanding, one would write “—A” or 
“A” for X — A (where A C X, and therefore A is a set) and call “—A” the 
complement of A (with respect to X). Note that for any set A € Uy, Uy —A 
(for any N C M) is a proper class (Exercise III.21); thus we will have little 
use for complements. It is the difference (most of the time of sets, rather than 
classes) that we will have use for. 


We note our intention to use informal class notation extensively. Therefore, 
it is important to remember at all times that ZFC set theory does not admit 
(proper) classes as formal objects of study. 

Invariably, there is nothing that we can say with the help of a class term 
{x : ~4@(x)} that cannot also be said — with a lot more effort and a lot less in- 
tuitive transparency — by just using the formula .4(x) instead (e.g., Bourbaki 
(1996b) and Shoenfield (1967) do not employ classes at all). Definition II.4.1 
will be used as a tool to eliminate class terms in order to go back to the formal 
language of set theory — whenever such caution is necessary (notably in the 
introduction of axioms). 


Iil.5. Axiom of Pairing 


Consider now any two sets A and B. Say the first was built at stage X4 and 
the second at stage Xg. We have no difficulty imagining that a stage & exists 
following both these stages. By Principle 0 (p. 102), at stage & we can built any 
set whose elements are available. In particular A and B are available; thus a set 
that contains exactly A and B exists. However, the axiom that flows from this 
discussion — the axiom for pairing — will have a simpler form if we allow the 
possibility of additional members, beyond A and B, in the set asserted to exist. 


1.5.1 Axiom (Unordered Pair). For any atoms or sets a and b there is a set 
c such thata € c and b € c. Or, stated in the formal language, 
(Az)(aezAbeEz) 


1 Other axiomatizations of set theory, originating with Gédel and Bernays, admit (proper) classes 
as formal objects of study. See for example Monk (1969). 
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or still (universal closure of the above) 


(Vx)\(Vy)\(Azl(x €zAy ez) 


TIL.5.2 Remark. By III.1.3, Hzpc a €z— =U (z). Thus, by tautological im- 
plication Fzp»>¢p aE ZAD EZ AU(z)Aa € ZAD € z. The equivalence 
theorem then gives 


Fzrc (az)(-U(z) Aa € ZA BD € 2) 


Thus the object z guaranteed to exist in III.5.1 is a set, as expected. © 
II.5.3 Proposition. -zpc (Va)(Vb)Coll,(x =a V x = b). 


Proof. It suffices to prove 
Kzprc Coll, (x =aVx =b) 


We have 

(1) (zlaezAbez) 

(2) ae AAbeada 

(3) aeA 

(4) beA 

(5) —=U(A) 

(7) x=b>(xEAodDeA) Leibniz axiom) 
(8) x=a—>xeEA (6) and (3) and taut. impl.) 
(9) x=b>xeEA (7) and (4) and taut. impl.) 
(10) x=aVx=boxeEA (8) and (9) and taut. impl.) 


(11) Coll.(« =aVvx=b) 


( 
( 
( 
( 
( 
(6) x=a>(xEeAoaeA) (Leibniz axiom) 
( 
( 
( 
( 
( 


(10) and (5) and 11.2.1) 


TII.5.4 Corollary. -zpc Coll,(x = a). 


Proof. See the proof above, and use (5) plus (8) and III.2.11. Alternatively 
(without referring to the proof), by Frau: x =a <x =aVx =a. 
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III.5.5 Definition (Pairs and Singletons). The above proposition and its corol- 
lary allow the formal introduction of the set terms 
{x:x=aVx=b} (1) 
and 
{x: x =a} (2) 


We also introduce the terms {a, b} (unordered pair) and {a} (singleton) by 
the formal definitions (cf. HI.2.4) 


{a,b} ={x:x=aVvVx=b} 


and 


{a} = {x :x =a} 


III.5.6 Remark (Denoting Sets by Listing). We say that {a, b} and {a} denote 
sets by explicit listing of their members. We note that the informal notation 
N = {0, 1, 2,...} does not denote a set by explicit listing (in the metatheory). 
Such notation is only possible for what we intuitively understand as “finite” sets. 
The “...” indicates our inability to conclude the listing and hints at a “rule”, 
or understanding, of how to obtain more elements. Such understanding depends 
on the context (in the case of N, just add 1 to get the next member). 


IIL.5.7 Proposition. zc {a, b} = {b, a} and zpc {a} = {a, a}. 


Proof. By IlI.2.6, commutativity of V, and idempotency of V (i.e., Frtaut -4 <> 
AN A). 


TII.5.8 Remark. (i) Why “tzpc” rather than just “Ft” above? That is because 
{a, b} and {a} were formally introduced as terms (sets) in II.5.5. Their intro- 
duction necessitated the prior proof in (our present fragment of) ZFC of the 
formulas Coll,(x = a V x = b) and Coll,(x = a). As far as the class terms 
{x:x =aV x = b} and {x :x = a} are concerned, we havet 


F{x:x=aVx=bh={x:x=bVx=a} (1) 


and 


F {x:x =a} ={x:x =aVx=a} (2) 


1 Cf. II1.4.1 regarding the use of the unbracketed “=” in (1) and (2). 
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by III.4.1, since the above simply abbreviate 
Fxe=aVx=bex=bVx=a (3) 
and 
Fxe=aox=aVx=a (4) 


Thus, (1) and (2) are just stating the tautologies in (3) and (4), while Proposi- 
tion III.5.7 states much more in between the lines, in particular that {a, b} and 
{a} are sets. 

If we had introduced the pair and singleton as abbreviations of the respective 
class terms instead,’ then the above proposition would be provable in pure 
logic — for it would be just stating (3) and (4) — and whether or not the terms 
referenced are sets would be a separate issue. 

This remark was necessitated by our decision not to differentiate the notations 
for set terms and class terms. 

(ii) Proposition [1.5.7 is popularized in naive set theory by saying “when 
we list the elements of a set explicitly, multiplicity or order of elements does 


not matter’. 


IiI.5.9 Remark (Relaxing the Proof Style). It would be counterproductive to 
introduce a rich argot towards the simplification of formal texts on one hand, 
while on the other hand we continue to offer extremely detailed formal proofs 
such as the one for II.5.3. Well, we do not have to be that formal always, 
nor can we afford to be so when our arguments get more involved. We will 
frequently relax the proof style to shorten proofs. This relaxing will invariably 
use shorthand tools such as English text, class terms, and a judicious omission 
of (proof) details. 

For example, a relaxed version of the proof of II.5.3 would read like this: 
Let a and b be any objects (i.e., sets or urelements). Let us denote by c any 
set (asserted to exist in HII.5.1) such that a € c and b € c [this combines 
steps (1)-(5) of the formal proof]. Thus {x:x = avx = b} C c; hence 
{x:x = aVx = b} (denotes) is a set by separation (III.4.10) [the obvious 
steps (6)—-(10) were just compacted to (10)]. 


© While the axiom of pairing is not provable from the axioms that we had at our 
disposal prior to its introduction (see p. 133), it becomes provable once (an 


| This is how it is often done; e.g., Levy (1979). 
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appropriate version of) collection and power set axioms are introduced (see 


Exercises III.15 and III.16). oe 


III.6. Axiom of Union 


How about the classes {x: x =aVx=bVx=c}and{x:x=aVx=bV x= 
c V x =d} — for short, {a, b, c} and {a, b, c, d} — where a, b, c, d are arbitrary 
objects? Are these sets? 

Of course, we could invoke Principle 0 (p. 102) again, and show that these 
classes are sets indeed. However, it is not fitting an axiomatic approach to go 
back to this metamathematical principle all the time. It is more elegant — and 
safer — to have just one axiom that will imply that all such objects are sets. 
What we have in mind is something more powerful than an endless sequence 
of axioms for (unordered) triple, quadruple, etc. 

We already know that {a, b}, {c, d}, and {c} are sets. Then, applying pairing 
again, the following are sets too: 


{{a, b}, {c}} (1) 


and 


{{a, b}, {c, d}} (2) 


What we need is the ability to remove the level of braces just below the outer- 
most, to obtain the (unordered) triple (from (1)) and quadruple (from (2)). In 
essence, we want to know that, in particular, {a, b} U {c} and {a, b} U {c, d} are 
sets. 


We will address this immediately, but in somewhat more general setting. 
First, we will define the operation that removes the “top level” of braces of all 
non-atomic members of aclass. To this end, and in all that follows in this volume, 
we will benefit from a notational device. We often use bounded quantification in 
set theory, i.e., “there is an x in A such that...” and “for all x in A it follows 
that ...”. 


III.6.1 Informal Definition (Bounded Quantification). The notations (Ax € 
A).F and (vx € A).F are short forms of (Ax)(x € AA.F ) and (Vx)(x € A> 
F ) respectively. 


Iil.6.2 Example. We can easily verify that De Morgan’s laws hold for bounded 
quantification, i.e., 


Fk (dx € A).F & 7A(Vx € A)AF 
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Indeed, 


a(Vx € A)AF 
< (by IIL6. 1) 
A(Vx)\(x Ee A> AF) 
o (equiv. theorem) 
A(Vx)\(Ax E AV AF) 
o (“v- De Morgan’ ‘ 
(Ax)7>(-x € AV AF) 
o («v V-De Morgan” and equiv. theorem) 
(Ax)\(x € AA F) 
< (by IIL6. 1) 


III.6.3 Informal Definition (Union of a Class; Union of a Family of 
Sets). Let A be a class. The symbol (J A is an abbreviation of the class term 
{x : (Ay € A)x € y}. We read |) A as the union of all the sets in A. 

If A contains no atoms, then it is called a family of sets, and (J A is the union 
of the family A. 


IIL.6.4 Remark. Let A = {x : .4[x]}. We have a number of variations in the 
notation for LJ A, namely, U),<4 x or Ufx : 4[x]} or U4,;*- In any case, 
after we eliminate class notation, all these notations stand for the class term 
{x : Gy)C4[y] Ax € y)}. 


IIL.6.5 Example. ) {#, {|}, (1, {2}}} = {|, 1, {2}}, where “#”, “|”, “1”, “2” 
are names for atoms. So, in the result of the union, “loose atoms” are lost. 


Let now A be a set, and consider |) A, that is, {x :(dy € A)x € y}. Let A 
be formed at stage &. Then each y € A must be available before X, and since 
x € y for each x that we collect in U A, a fortiori, x is available before &. It 
follows that | A itself can be built at & as a set, so it is a set. As in the case of 
pairing, we state the following axiom of union in a “weak” form. It asserts the 
existence of a set that contains the union as a subclass. This, by II.4.10, makes 
the union a set. 
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III.6.6 Axiom (Union). 


(AzyWx)Wy)x EyAyEe A> x €2Z) () 


III.6.7 Remark. Formula (1) has A as its only free variable. Of course, it is 
equivalent to a version that is prefixed with “(VA)”. Now, this axiom is stated 
a bit too tersely (especially for our flavour of ZFC, which allows atoms) and 
needs some “parsing”. 

(a) (1) is provably equivalent to 


AU(A) > (Az)(Wx)Vy)(x €EyAyEeAr>x€z) (2) 


Indeed, (2) is a tautological consequence of (1). Conversely, (1) follows from (2) 
and proof by cases, because we can also prove 


U(A) > Gzy\(Vx)\(Vy)\(x EeyAyEe A> x €2Z) (3) 


Let us do this. We have 
U(A)F ze my € A 


Thus, by tautological implication, 


U(A)FzprcexeEyAyEeArxez 


Now, generalization followed by an invocation of the substitution axiom 
gives 


U(A) Frc Az\WxVy)a eyAyEeA>x €z) 


from which the deduction theorem yields (3). 


(b) (1) does not ask that z, whose existence it postulates, be a set (it could 
be an atom). However, we can show using (1) that 


kare (UW) AWW EYAYEASxED) — 4) 


To see this, let B be a z that works (we are arguing by auxiliary constant) in (1). 
Thus we add (to ZFC) the assumption 


WxVyaeyAyEeA>xe B) (5) 


Hence 


xe€yAayeAroxeB (6) 
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We have two cases (proof by cases): 


¢ Let (.e., add) -U(B). By (5) and tautological implication, we get 


AU(B)AWx)Vy)\(xEeyAyEeA>xeB) 


and then the substitution axiom yields (4). 
¢ Let (i.e., add) U(B). By III.1.3 and the assumption we obtain —x € B; hence 


xE€Boxed 


by tautological implication. (6) and tautological implication (followed by 
generalization) yield 


(Vx)\(Vy)\(xEyAyEeArxeE) 


from which 


AUB AWx)\Vy\(x EyAyEA>x EB) 


Once again, the substitution axiom yields (4). 


III.6.8 Proposition. '-zpc Coll, (@y Ee A)xe v), where A is a free variable. 


Proof. We use (4) of II.6.7(b). Add then a new constant B and the assumption 


AU(B)AWx)Vy(x EeyAyEeAr>xeB) 


Thus 
=U(B) (@) 
and 
xeyAayeAroxeB (ii) 
We can now show that 
GyyweEeAAxey)>xeEeB (iii) 


which will rest the case by III.2.11 and (i). Well, (ii) follows from (ii) by 
d-introduction. 


I1I.6.9 Definition (The Formal Big |) and Little U). For the record, we 
introduce into our theory a unary (1-ary) function symbol, “()”, formally, by 
the defining axiom 


| A =2 3 UG) A (Vx)(x zZ<> (dy € A)x y) (1) 
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We also introduce a new binary (or 2-ary) function symbol, “U”, by the defining 
axiom 


xUy= Lt. y} (2) 


Worth repeating: If A is a set, then so is _) A. Indeed, the assumption translates 
to Coll,.4; hence the class term A — that is, {x :.4} — is (really, names) a 
formal term “ft” of set theory. So is Jt by definition of terms, and III.6.9. 


But is it an atom? Since Fzp¢ —U ( U x) by the preceding definition, where 
x is a free variable, + zpc —-U ( U t) by substitution. © 


III.6.10 Remark. By III.6.8 the function |) “makes sense” for both set and 
atom variables. It is trivial to see from (1) above that 


type U(A) > [JA =G 


It follows that the binary formal U also makes sense for any arguments and that 
Fzpc U(A) AU(B) > AUB=9. 


TII.6.11 Example. What is J{a, b}? How does it relate to the informal defini- 
tion (III.4.12, p. 141)? Let us calculate using III.6.3: 


{x: (Ay)(y € {a, Db} Axe y)} = (x: Gy(y =aVy =b)Ax € y)} 
={x:(dyy =aAxeyVvy=bAxey)} 
= {x: (y)(y=aAxey)V(y(y=bAxey)} 
={x:xEeavxeb} 
=aUb 


The second “=” from the bottom was by application of the “one-point rule” 
(1.6.2). Note that in “a U b” we are using the formal “U” to allow this term to 
be meaningful for both sets and atoms a, b.! 


IiI.6.12 Informal Definition (Intersection of a Family). The intersection of 
a family F, in symbols () F, stands for {x : (Vy € F)x € y}. 

If for every two A and B ina family F itis the case that A #4 B > ANB =Q, 
then we say that F consists of pairwise disjoint sets or is a pairwise disjoint 
family. 


t “(J{a, b}” of II1.6.3 is meaningful for both sets and atoms a, b. So is the formal “U” of III.6.9, 
unlike “A U B” of III.4.12, which is defined only for class arguments. 


154 III, The Axioms of Set Theory 


(1) Operationally, we certify things such as “F is a pairwise disjoint family” 
by proving in ZFC the defining property “A #4 B > AN B = @ forall sets A 
and B in F”’. Correspondingly, a statement such as “Let F be a pairwise disjoint 
family” is another way of saying “assume that A 4 B > ANB = 9 for all 
sets A and B in F”. 

(2) We are not interested in the intersection of arbitrary classes (that may 
contain atoms, and hence not be families) in introducing the big-() abbreviation. 
We will also make an exception to what we have practiced so far, and we will 
not introduce a formal counterpart for (.).' It is sufficient that we have a formal 
little N. 

Let A = {x:.4[x]}. We have a number of variations in the notation for 


MA: Ayeea x or (Vx: 4 Ex} or 1g ¥- © 


III.6.13 Example. Let F = {{1, 2}, {1, 3}} (this family is a set; working in the 
metatheory, apply pairing three times). Then () F = {1}. 


Let now G be any family, anda € G. Then (]G C a. Indeed, the translation 
of the claim (by III.6.12 and III.4.7) is 


aeGoWVyQvEeGroxey>xea (1) 


We can prove (1) within pure logic: Assume a € G and (Vy)(ye G > x Ey). 
By specialization, ae G— x €a; hence (MP) x €a. By the deduction theo- 
rem, (1) is now settled. What happens if G = @? (See Exercise III.18.) 


III.6.14 Proposition (Existence of Intersections). [f the family F is nonempty, 
then ()F is a set. 


Proof. By Example II.6.13 and separation (III.4.10). 


“ce 


Priorities of set operations. (that is, difference — as we will not use comple- 
ments) and “U” have the same priority and associate right to left. “M” is stronger 
(associativity is irrelevant by Exercises III.19 and II.20). Thus A — BUC = 
A—(BUC), while AN B—C=(ANB)—C,ANBUC=(ANB)UC. 
When in doubt, use brackets! © 


III.6.15 Proposition (De Morgan’s Laws for Classes). Let A, B, C be arbi- 
trary classes. Then, 

(1) F C— (AUB) = (C— A)N(C —- B) and 

(2) C— (ANB) =(C— A)U(C— B). 


+ We do not feel inclined to perform acrobatics just to get around the fact that (9 cannot be a 
formal term: it is not a set (see Example III.6.13 below). 
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Proof. We do (1) imitating the way people normally argue this type of thing, 
“at the element level”. The proof uses pure logic and Definitions III.4.1, I1.4.3, 
11.4.8, and I1.4.12. 

Cc: Let x € C— (A UB). Then 


xeC (i) 
and 
x ¢ (AUB) (ii) 
By definition of U, (ii) yields 
x€AAx EB (iii) 


Combine (7) and (ii7) to get (by definition of difference) 
xeC-AAxeC-B 


or (by definition of )) 


xe(C-—A)N(C-B) 


Done, by the deduction theorem. 
D: Let x e (C — A) N(C — B). Then 


xeC-AdAxeC-—B 


Hence 
xeC (iv) 
and 
x€AAx¢€ 
This last one says (by definition of U) 
x € (AUB) 
which along with (iv) gives 
x €C—(AUB) 


Case (2) is left as Exercise III.26. 


III.6.16 Example. A better way, perhaps, is to use translations and reduce the 
issue to a tautology: (1) above translates to (II.4.1, III.4.3 and II.4.12) 


CNWAV B)O(EANAB)AN(EAAR) 
Noting that (by propositional De Morgan’s laws) 
Equt 2 AT4V BP) oe EA(AGZ ARAB) 


we are done. 
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Il.7. Axiom of Foundation 


III.7.1 Example. We have seen that the “absolute universe”, Uy = {x :x =x}, 
is a proper class. 


The Russell paradox argument does not depend on what exactly M is; there- 
fore an alternate Russell class, {x € Uy : x ¢ x}, exists in all alternate universes 
Uy (where @ C N C M — see III.4.20). Thus all universes Uy are also proper 
classes. 


Informal Discussion (towards Foundation). In preparation for the axiom of 
foundation, we next reexamine the “magic” of the statements x ¢ x and x € x. 
Some people react to Russell’s paradox by blaming it on an expectation that 
x € x might be true for some x. This is not the right attitude, regardless of what 
we think the answer to the question x € x is. After all, there is an alternative 
“theory of sets” where x € x is possible, and this theory is consistent if ZFC 
is — so, in particular, it does not suffer from Russell’s paradox.t 

What really is taking place in the Russell argument is a diagonalization — 
a technique introduced by Cantor to show that there are “more” real numbers 
than natural numbers — and this has nothing to do with whether x € x is, or is 
not, “really true”. 

We can visualize this diagonalization as follows. Arrange all atoms and sets 
into a matrix as in the figure below: 


abceD BA x 


ii it G@ @ Gi f 


ii it G@ @ Gi f 
ii it G@ @ @ 7 


<bweOoo SF 8 


The a,b,c,...that label the columns and rows are all the sets and atoms 
arranged in some fashion. We may call these labels the heads of the respective 
rows or columns that they label. 

Each entry, i, can have the value 0 or 1 or the name of an atom (without 
loss of generality, we assume that no atom has as name 0 or 1). This value is 


+ See Barwise and Moss (1991) for an introduction to hypersets. 
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determined as follows: 


0 ifzew 
Entry atrow zandcolumnw=j}1 = ifz¢wA-U(w) 
w if U(w) [N.B. This entails z ¢ w] 


Here are a few examples: Say a is an atom. Then all the i’s in column a have 
value a. Let b = {b, D}.' Then column b has | everywhere except on rows b 
and D, where it has 0. Conversely, the head of a column sequence of 0, 1, and 
atom values is determined by the sequence.! For example, the sequence of 1 
everywhere determines #); the sequence 


1101... 
ee 
all 1 


i.e., the one that has | everywhere except at (row) position c, where it has a 0, 
determines the set {c}. 


Let us now define informally a sequence, and therefore a class that we will 
call R, by going along the main diagonal (that is, along the matrix entries 
(a, a), (b, b), (c, c), ...) “reversing” all the i-values (specifically, nonzero to 0 
and 0 to 1). That is, 


~ 0 at position x if entry(x, x) is nota0 
the sequence for R has a (1) 
1 at position x if entry(x, x) isa0 


It follows that the sequence for R differs from the sequence for any x at posi- 
tion x. 

Thus, R cannot occur anywhere in the matrix (as acolumn) — for if it occurred 
as column x, it then would be “schizophrenic” at matrix entry (x, x) — so it is 
not a set (recall, the matrix represents all sets and atoms as columns). 


What is the connection with Russell’s paradox? Well, in (1) we are saying 
thatx € Riffx ¢ x; hence R=R , the Russell class! The above diagonalization 
can be readily adapted to “construct” a set that is not in a given set b. All we 
have to do is to think of the matrix as “listing”, or representing, just all the 
atoms and sets in b rather than in Uy (see Exercise III.6). 


¥ In view of what we said in the preceding footnote regarding hypersets, we allow just for the sake 
of this discussion the generality where b € b is possible. 

* Do not expect all sequences to appear as columns. For example the sequence whose members 
are all 0 denotes Uy, but our matrix heads are only sets or atoms. 
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We have chosen to describe (by ZFC) a standard model where, among its 
properties, we have that x € x is always false, by Principle 1./ Similarly, 
a € b € a and, more generally, a € b € --- € a are absurd in our model, for 
the leftmost a should be available before the rightmost a for such a chain of 
memberships to be valid (Principle 1). 

Note that if, say, a € b € a were possible, then we would get the “infinite” 
chain 


-aeEebeaecbeacbea (1) 


i.e., a would be “bottomless”, like an infinite regression of a “box in a box ina 
box in a box...”. Sets that are not bottomless are called well-founded. 

A bit more can be said of the standard model. It is not only “repeating” 
chains such as (1) that are not possible, but likewise non-repeating infinite 
“descending” chains such as 


“+n © An] € +++ € AQ EA, EH (2) 


There are no bottomless sets, period. 

Towards formulating an appropriate axiom of ZFC that says “bottomless sets 
do not exist”, let A be any nonempty class. Assume that it contains no atoms. 
Now, there must be a set (maybe more than one) in A that was constructed no 
later than any other set in A (for example, if # and | name atoms, and if {#} € A 
and {|} € A, then {#} and {|} are two among those sets in A that are constructed 
the earliest possible). 

Let now, in general, y be such an earliest-constructed set in A, and let x € y. 
It follows that x ¢ A (for x is an atom — hence not in A — or is a set built 
before y). The existence of sets like y in A captures foundation. Thus, taking 
A to contain precisely the members of (2), we see that (2) is absurd. © 


III.7.2 Axiom (Foundation Schema). Class form: 
A #46 —> (dy € A)(A(x € y)x € A) 


or, applying De Morgan’s laws, 


A#AG—> Aye A\Vx ec yx EA 
The axiom expressed in the formal language is the schema 


€y). 4] > (@y)(4[y] A 7x € y).4[x)) 


+ Note that in such a state of affairs all entries (x, x) are nonzero; thus Ris the sequence composed 
entirely of 0, representing Uy. This is as it is expected, since now R = Uy. 
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III.7.3 Remark. 


(1) The foundation axiom (schema) is also called the regularity axiom. 

(2) The schema version of foundation is due to Skolem (1923). It readily 
implies — using .4 = y € A — and is implied‘ by the single-axiom 
(non-schema) set version where A is a free variable other than y: 


(Aay)y € A> (y)(y € AA 7(ax € y)x € A) 
or 
=U(A) > A¢@ > (Ay € A\(A(ax € y)x € A) 


(3) The discussion that motivated III.7.2 was in terms of aclass A that contained 
no atoms. No such restriction is stated in III.7.2, for trivially, if A does 
contain atoms, any such atom will do for y. If it is known that A is a family 
of sets (i.e., that it contains no atoms), then foundation simplifies to 


A#G—> AyEeAVNA=G 


(4) If for a minute we write < for €, then III.7.2 (formal language version) 
reads exactly as the least number principle on N. Of course, € is not an 
order on all sets; however, if its scope is restricted on appropriate sets, then 
it becomes an order, and III.7.2 makes it a well-ordering. More on this in 
Chapter VI. 


II.7.4 Example. Let us derive once again the falsehood of ae a andaebea, 
this time formally, using the axiom (schema) of foundation. 

Given a and b (sets or atoms), the sets S = {a} and T = {a, b} exist,? as we 
saw earlier. Since S 4 GJ, there is a y € S such that x € y is false for all x € S 
(II.7.2). The only candidate for either y or x is a. Thus, a € a is false. 


O.K., let us repeat the above in a (formal) manner so that we will not be 
accused of arguing semantically (saying things like “false” — colloquial for 
“refutable” — and the like): 

Fzprc 7U({a}) > {a} #B > Cy € {a})(-Gx € y)x € {a}) 
by III.7.3(2) and III.5.5. Since Fzpc -U ({a}) and zpc {a} 4 0, modus ponens 
yields 


Fzrc (Ay € {a})(-Gx € y)x € {a}) (1) 
1 Not so readily. We will get to this later. 


t “The set {a} exists” is another way of saying that “{a} is a set” or that “the term {a} can be 
formally introduced”. 
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Let B be a new constant, and add the assumption 
Be {a} A -@x)(x € BAx € {a}) (2) 


or, as we say when we act like Platonists, “let B be an object such as (1) tells 
us exists”. From (2) we derive B € {a} hence, by III.5.5, 


Baa (3) 
and 
=A(Ax)\(x € BA x € {a}) 
which in view of (3) and III.5.5 yields 
a(Ax)\(x €aAx =a) 
i.e., (“one point rule”, I.6.2, p. 71) 
ma eéa 


We have just refuted a € a (a a free variable). 
For T we only offer the informal (Platonist’s) argument: There is a y € T 


such that x € y is false for all x € T. 


Case where y = a: Then we cannot have b € a. 
Case where y = b: Then we cannot have a ¢€ b. 


So we cannot have botha € bandb € a (1e.,a €b €a). 


III.8. Axiom of Collection 


In older approaches to set theory, when the formation-by-stages doctrine was not 
available, how did mathematicians recover from paradoxes? “Sets”? like R and 
Uy were known to be “paradoxical”, and this was attributed to their enormous 
size. In turn, this uncontrollable size resulted into some of these “sets” becoming 
members of themselves, a situation that was (incorrectly) considered in itself 
as paradoxical and a source of serious logical ills — such was the impact of Rus- 
sell’s paradox and the central presence of the “‘self-referential statement”! x € x 
in its derivation. For example, the “self-contradictory” (as they called it) “set of 
all sets”, Vy, certainly satisfied, according to the analysis at that time, Vy € Vy. 


— 


We use the term “set” in quotes because at the time in the development of set theory that this 
commentary refers there was no technical distinction between sets and proper classes. Rather, 
there was a distinction between “sets” and “self-contradictory” sets; they were all sets, but some 
were troublemakers and were avoided. 

If x could talk, it would say “I am a member of myself”. 


+ 
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That this was considered to be a “problem” can be seen, for example, in 
Kamke (1950, p. 136) where he states that all “sets”, such as (Russell’s) R and 
Vy, “that contain themselves as elements are ‘self-contradictory’ concepts as a 
matter of course, and are therefore inadmissible’. He adds that no sets that con- 
tain themselves are known that reasonable people would “regard as meaningful 
sets”.! 

So the “set of all sets” was to be avoided at all costs.* But how do you define 
“large”? In the absence of an exact definition, at the one extreme you may be 
out on a witch hunt, and at the other extreme you may be the victim of error 
(see III.8.1). Are all large “sets” members of themselves? (Again, see II.8.1.) 

Of course, all these worries were for the mathematician who worked on the 
foundations of mathematics. The analyst, the number theorist, and the topologist 
were not worried by such issues, for they worked in “small universes”, or 
“reference sets”. That is, R (reals) or C (complex numbers) or Z would be the 
reference sets of the analyst and number theorist: all the atoms they needed 
were members of these reference sets, and any sets they needed were subsets 
of the reference sets. The topologist too would be satisfied to start with some 
“small” space (set), X, his reference, and then study subsets of X, looking for 
“open” sets, “closed” sets, “connected” sets, etc. 

In elementary expositions of set theory, even contemporary ones, the refer- 
ence set approach is sometimes misrepresented as a logical necessity for the 
avoidance of paradoxes. 

Let us conclude this discussion by proposing a new informal (metamathe- 
matical) principle, which invokes “largeness” in a relative sense (Cantor’s work 
implicitly used this principle, which was first articulated by Russell). This prin- 
ciple, on one hand, yields — by a different route — the axiom of separation; on 
the other hand it yields the important axiom of replacement. 


Principle 3 (The Size Limitation Doctrine). A class is a set if it is not “larger” 
than some known set. Correspondingly, it is not a set if it is as large as a proper 
class, for, otherwise this proper class would also be a set. 


“Largeness” we will leave undefined, but this drawback is not serious, for we 
will apply the principle (carefully, and only twice) just to “derive” two axioms. 


} The reader is reminded of the nowadays acknowledged existence of such (hyper)sets (Barwise 
and Moss (1991)) — not, however in ZFC. 

Indeed, mathematicians were suspicious of even the phrase “all sets” in something as innocent 
as “...let us divide all possible sets into [equivalence] classes...” (N.B. Just let us divide, 
not attempt to collect into a “set”.) See Wilder (1963, p. 100) for further discussion, where he 
speculates whether the “concept” of “all sets” might be as “self-contradictory” as the concept of 
“the set of all sets”. 
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After this has been accomplished, we will forget Principles 0-3 and always 


defer to the axioms. 


II.8.1 Example. Let B be a set and let A C B. Then certainly A is not larger 
than B, so A is a set by Principle 3. Thus separation (see the class form of 
the schema, III.4.10) follows from the doctrine of size limitation as much as it 


follows from that of set formation by stages. 


Next, let U’ be the class of all singletons. Is this class “large” (hence proper), 
or is it “small” (hence a set)? This example appears in Wilder (1963, p. 100) 
(see in particular the closing remarks prior to his 4.1.2), where the argument 
implies that this “set” is not “self-contradictory” (what we now call a “proper 
class’’), for, after all, it is far from containing “all sets”. In fact, in 4.1.2 (loc. 
cit.) the “cardinal number 1” is identified with the “set” of all singletons (U’) 


without any adverse comment. 
Well, it turns out that U’ is a proper class, for it has the same size as Uy, as w 


can readily see from the fact that each x € Uy corresponds to a unique {x} € U’ 


and vice versa. Thus, as a “set”, U’ would be every bit as “self-contradictory 


as Uy. Incidentally, we must wonder to what extent the fact that, as a “set”, U’ 


e 


” 


clearly satisfied U’ ¢ U’ made it more acceptable than Uy back then. 


Now consider a set A. Let us next “replace” every element x € A by some other 


object x’ (set or urelement)./ 


Evidently, the resulting class (let us call it A) is not larger than the original 
(and could very well be smaller, for we might have replaced several x € A by 
the same object); hence, by Principle 3, A is a set. This is the principle of (it 
goes under several names) replacement or substitution or collection, and it is 


very important in ZFC. 


We prefer not to use the name “substitution” for this nonlogical axiom, for 


that would clash with our use of the name for the logical axiom 
(x <— t] > €x).4 


We will adhere to the name “collection”. 


Below we state it as an axiom in the formal language. In the next section, 


once the notions of relation and function have been formalized, we will give 
very simple version of the axiom. 


a 


+ We use “replace” in the weak sense, where it is possible that for one or more x € A the replacing 


object is the same as the replaced object. 


© 
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IiI.8.2 Axiom (Schema of Collection or Replacement). For any formula 
Aix, yl, 


(Vx € A)Ay) PA [x, y] > (Az)(Vx € A)Ay € 2A Ix, y] (1) 


where A is a free variable. 


III.8.3 Remark. (I) In any specific instance of the axiom (schema) of collection 
the formula [x, y] is the “agent” that effects the replacements: The hypothesis 
ensures that for each x € A, Y suggests a y (maybe it has more than one 
suggestion) — depending on x — as a possible replacement. 

The conclusion says that there is a set,' z, which contains, instead of each x 
that was originally in A, one (or more) replacement(s), y, among the possibly 
many that were suggested by 7. (All the suggestions were made to the left of 
“ay 

There is a small difficulty here: In the formal statement adopted in II.8.2 — 
where we have allowed more than one possible candidate y to replace each x — 
we run at once into a size and a “definability” problem: Obviously, if we are 
going to argue that the size of z is small (and hence z is a set) we have to be 
able to 


(a) either choose a unique replacement y for each x € A (and 7[x, y] cannot 
help us here; we have to do the choosing), or 

(b) choose a “very small number” of replacements y for each x € A —ie., cut 
down the size of the class of replacement values for each x — so that the 
size of z is not substantially different from that of A.t 


If we were to take approach (a), then we would need a mechanism to effect 
infinitely many choices, one out of each class A, = {y:A[x, y]}, thus in effect 
turning the hypothesis into (Vx € A)(A!ly)@[x, y] (where @[x, y] > Ax, y], 
for all x € A) so that we could benefit from the size argument preceding III.8.2. 
However, this would require (a strong form of) the axiom of choice, the axiom 
that says, in effect, “don’t worry if you cannot come up with a well-defined 
method to form a set consisting of one element out of each set in a (set) family 
of sets; such a set exists anyhow”. 


— 


Well, not exactly. It says that a formal object exists, but this object could well be an atom. Since 
we will prove (III.8.4) an equivalent statement to (1), which explicitly asks that z be a set, we 
can pretend in this discussion that (1) already asks that z be a set, although it does so between 
the lines. 

Clearly, it is not “safe” to collect into z all possible y that 7 [x, y] yields for each x. For example, 
if P[x, y] =x C yand A = {6}, then allowing all the y that 7 yields for x € A we would end 
up with z = Uy, not a set. 


a 
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We can do better than that (avoiding the axiom of choice, which we have 
not formally introduced yet) if we allow ourselves to put in z possibly more 
than one y that satisfy A[x, y] for a given x € A, that is, approach (b). We 
do this as follows: To show (informally) that a set z as claimed by the formal 
axiom exists, and that therefore the axiom is “really true’, let us consider, for 
each x € A, all the y such that A[x, y] is true which are built at the earliest 
possible stage. There is just one such stage for each x, call it ,.. Now the class 
of all such y, call it Y,, is a set, for all its elements are available at stage X,, 
and there certainly is a stage after X, (at such a stage, Y, is formed as a set). 

Thus, for each x € A we ended up with a unique set Y,.. Using the informal 
analysis prior to the axiom, there is a set B that contains exactly all the Y,. It 
is clear now that we can “well-define” z: z = J B will do, and is a set by the 
union axiom. 

(II) The hypothesis part of the axiom is usually stated in stronger terms, 
viz., (Vx € A)(Aly)A[x, y], and in that format it usually goes under the name 
replacement axiom. The present form (mostly known as the collection axiom, 
e.g., Barwise (1975)) is clearly preferable, for to apply it we have to work 
less hard to recognize that the hypothesis holds. All the various formulations 
of collection/replacement are equivalent in ZF (even without the “C’”). Some 
other forms besides the ones stated so far are the following, where we are using 
set term notation in the interest of readability: 


(1) Bourbaki (1966b): 
(Vx)(Az)\(Vy\(AIx, y] > y € z) > (WA)Coll, (Ax € AVF [x, y] 
or in more suggestive notation 
(vx)Az){y : Alx, yl} © z > (WA)Coll, (Ax € A)PIx, y] 
(2) Shoenfield (1967): 
(Wx)(Az)(VyM(AILx, y] @ y € z) > (WA)Coll, (Ax € A) [x, y] 
or in more suggestive notation! 
(vx)Az)ly : Alex, yl} =z > (VA)Coll, (Ax € A)PIx, y] 
¥ Recall that (4!x).Z says that there is a unique x satisfying #. That is, (Ax)(R[x] A (Vy)(y 4% 
x > “R[y))). 
t The “suggestive” notation in (1) above is 100% faithful to the formal version, since, by III.3.6, 
(Vy(PA Lx, y] > y € z)is provably equivalent to {y : A [x, y]} C z (see also IIL.1.5, p. 116). Not 


so for the suggestive rendering of (2) in the presence of atoms. For example, on the assumption 
U(z), (Wx)(Ax = x <> x € Zz) is not equivalent to {x : —x = x} = z. Well, on one hand 
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(3) Levy (1979): 


(Vx)\(Vy (Vy (MAL, YIAP Ix, vy] > y=y’) 
—> (VA)Coll,(Ax € A)AIx, y] 


We can readily prove (III.8.12 below) that collection implies all these alternative 
forms. While the converse is true, it will have to wait until we can formalize 
“stages” and thus formalize the argument we have used in (I) above to show 
that collection is “really true”. 

(III) We have restricted the way in which sets become available, namely, 
requiring that they be built in stages, or that they be not much larger than their 
“parents” (i.e., the sets that we have used to build them). In the process, we 
developed (most of, but not all yet) the ZFC axioms, as they flow from these 
doctrines, with the apparent result of managing to escape from the paradoxes 
and antinomies of the past. 

Thus, despite the lack of a (meta)proof of the consistency for ZFC, we are 
doing well so far. But is this apparent success at no cost? Have we got “enough 
sets” in this restricted axiomatic set theory to mirror what we normally do 
in everyday mathematics? Put another way, do we have enough stages of set 
construction in order to build sets that are as complicated as the various branches 
of mathematics require them to be? 

Of course, this is not a quantitatively precise question, and it will not get 
a quantitatively precise answer. However, the answer will hopefully satisfy us 
that we are doing well on this count too. 


Imagine two mathematicians who are playing the following game: They 
have a large and complicated set, A, to start with. They take turns, each taking 
an x € A, “making his move’, and then discarding x. A “move” consists of 
proposing the wildest, most complicated set of one’s experience that one can 
think of on the spur of the moment: S},. Of course, at each move each player is 
doing his best to utterly demolish the morale of his opponent and also to better 
his own effort at his previous move. 

At the end of the game, we have a class of all the S,, which is a set by 
collection. Now, the stage at which this class was built as a set is beyond the 
wildest imagination of our two friends — otherwise one of them would have 
proposed some set built at that stage during the game. 


Shoenfield (1967) does not employ atoms, so that our rendering of (2) captures exactly what this 
form of collection “says” in loc. cit. On the other, this is a moot point, for we prove (III.8.12) 
that the formal version (2) — even in the presence of atoms — implies version (3) without adding 
the qualifier “=U (z) A” before “(Vy)”. In turn, we find out later that form (3) implies collection. 
In short, versions (1) and (2), exactly as stated, are equivalent to our collection. 
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Put another way, we cannot “stretch” an “infinite” set A into a proper class 
by the device of replacing each of A’s elements with horrendously complicated 
sets —i.e., sets that are built extremely late in the stage hierarchy — in an effort 
to run out of stages. Starting with A € Uy, no matter how far we stretch it, we 
still end up inside Uy. Therefore, we do have a lot of stages. Equivalently, our 
“universe” is “very large”. 

There is an important observation to be made here: The reason that we have 
used the size doctrine to justify collection/replacement is, intuitively, precisely 
the result of the game above. We felt that we could not apply Principle 2 
(p. 102) reliably, or convincingly, towards arguing that “we could imagine” that 
a stage existed after all the stages for the construction of all the sets S, (our two 
colleagues could not imagine either). 

The reader is referred to Manin (1977, p. 46), where he states that — in 
the context of the doctrine of set formation by stages — the justification of the 
collection axiom goes beyond the “usual intuitively obvious”. 


III.8.4 Remark (a More Verbose Collection). We “parse” here (just as we 
did for the axiom of union in III.6.7) the collection statement, extracting in the 
process more information from the axiom than it seems to be stating. 


(1) First off, we never said that A has to be a set. Indeed, III.8.2 is equivalent 
to 


AU(A) => (Vx € A)(Ay) Px, y] > z)(Vx € Ay ED PAIx, yy] (2) 


This is because (2) is a tautological consequence of (1) in III.8.2 on the 
one hand. On the other hand, proof by cases with the help of 


Fzpc U(A) > (Wx € A)(Ay)A [x, y] > (Az)(Vx € A)Ay € 2A Ix, y] 

(3) 
combines with (2) to derive collection as originally stated. Why is (3) 
valid? We can prove the simpler 


Fopc U(A) > (Az)(Vx € A)(Ay € z)AIx, y] (4) 


from which (3) follows tautologically: Well, assume U(A). Then —x € A 
by III.1.3, from which 


xEA> Ayez)A{[x, y] 


by tautological implication. Generalization followed by an invocation of 
the substitution axiom (and modus ponens) finally yields 


(Az)(Wx € A)\(Ay € 2) PAIx, y] 
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Thus, we do not need to worry whether or not the variable A appearing in 
the collection axiom is a set. 
(II) Next we prove that 


(Vx € AVY) Lx, y] > Gz)(-U@) A (¥x € A)Ay € 2) AIX, y]) 
(5) 


is equivalent to collection. It is trivial that (5) implies collection, so we 
concentrate in the direction where collection implies (5). We use the de- 
duction theorem, assuming 


(vx € A)(Ay) Px, y] (6) 


under three cases: U(A); =U(A) and A 4 0; A = @. 
fe COUN: FU(A)V AU(A)A (A= OV AFB). 


¢ A=@orU(A). This yields —x € A (x free — see III.3.5 in case 
A = @); thus 


xEA> AyEeD#A[, y] 


by tautological implication. Another tautological implication and 
AL =U (B) yield 


AUD Aw EA > Gy € HPAI[x, y)) 


Following this up with generalization (and distribution of V over A, 
noting the absence of free x in ~U(@)), we have 


AU (BO) A (Wx € A)Ay € BD) AIx, y] 
Thus, by the substitution axiom 
(2z)(-U® A Wx € AYAY € AIX, yl) 


Note. Neither collection nor (6) was needed in this case. 


e —=U(A)andA #9. The assumption amounts to (Ay)y € A. We argue 
by auxiliary constant. Let B be a new constant, and assume 


BeA (7) 
By (6) collection yields 


(Az)(Wx € A)(Ay € z)PIx, y] (8) 
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Add yet another new constant, C, and assume 
(Vx € A)\Ay € C)AI[x, y] (9) 
Specialization of (9) using (7) and modus ponens yields 
(Ay €C)Y[B, y] 
Hence 
Gy)y ec 
by 4-monotonicity (1.4.23). Thus (by II.1.3) 
=U(C) (10) 
(9) and (10) tautologically imply -U(C)A(Wx € A)Gy € C)AI[x, y], 
which by substitution axiom gives 
(2z)(-U@) A (Vx € A)Ay € 2)ALx, yl) 
(HI) Finally, collection is equivalent to 
=U(A) > (Vx € A)(AY) AIX, y] 
> €z)(-U@ Ax € Ay € AL, y) a 


for (11) trivially implies (2), while (2) implies (11) using the two cases 
A = and A ¥ J exactly as we did above. 


TII.8.5 Remark (A Note on Nonlogical Schemata and Defined Symbols). 
By 1.6.1 and I.6.3, the addition of defined predicate, function, and constant 
symbols to any language/theory results in a conservative extension of the theory, 
that is, any theorem of the new theory over the original language is also provable 
in the original theory. Moreover, any formula.4 of the extended language can 
be naturally transformed back into a formula. 4 * of the old language (by elim- 
inating all the defined symbols), so that 


ews" (1) 


is provable in the extended theory. 

There is one potential worry about the presence of nonlogical schemata — 
such as the separation, foundation, and collection axiom schemata — that we 
need to address: Nonlogical axioms and schemata are specific to a theory and 
its basic language, i.e., the language prior to any extensions by definitions. For 
example, the collection schema III.8.2 (p. 163) is a “generator” that yields a 
specific nonlogical axiom (an instance of the schema) for each specific formula, 


© 
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over the basic language Lge, that we care to substitute into the metavariable 
F. There is no a priori promise that the schema “works” whenever we replace 
the syntactic variable 7 [x, y] by a specific formula, say “7”, over a language 
that is an extension of Ls by definitions. 


For example,' do we have the right to expect the provability of 
(Vx € A)(Ay)y = t[x] > (dz)(Wx € A)(Ay € z)y = t[x] 


in the extended theory, if the term ¢ contains defined function or constant 
symbols? 

Indeed we do, for let us look, in general, at an instance of collection obtained 
in the extended language by substituting the specific formula .7 — that may 
contain defined symbols — into the syntactic variable 7: 


(Wx € A)(Ay).2[x, y] > (z)(Wx € A)(Ay € z).#[x, y] (2) 


We argue that (2) is provable in the extended theory; thus the axiom schema is 
legitimately usable in any extension by definitions of set theory over Lge. 


Following the technique of symbol elimination given in Section I.6 (cf. 1.6.4, 
p. 73) — eliminating symbols at the atomic formula level — we obtain the fol- 
lowing version of (2), in the basic language Lset. This translated version has 
exactly the same form as (2) (i.e., of collection), namely 


(Vx € A)(Ay).4*[x, y] > (Az)(Vx € A)(Ay € z).A*[x, y] 


Thus — being a collection schema instance over the basic language — it is an 
axiom of set theory, and hence also of its extension (by definitions). 

Now, by (1), the equivalence theorem yields the following theorem of the 
extended theory: 


(«vx € A)(y).#4[x, y] > (Az)(Vx € A)Ay € z).AIx, vl) 


(«vx € A)(Ay).#*[x, y] > €z)(Vx € A)Ay € z).A*[x, yl) 


Hence (2) is a theorem of the extended theory as claimed. 
The exact same can be said of the other two schemata (foundation, 
separation).! 


| This scenario materializes below, in IIL.8.9. 

* One can rethink the axioms, for example adopting Bourbaki’s collection instead of our IIL.8.2, 
so that the separation schema becomes redundant. We have already promised to prove in due 
course that foundation need not be a schema. However, it turns out that we cannot eliminate all 
schemata. It is impossible to have a finite set of axioms equivalent to the ZFC axioms. We prove 
this result in Chapter VII. 


170 III, The Axioms of Set Theory 


III.8.6 Example (Informal). We often want to collect into a set objects that 
are more complicated than [“values of] variables, subject to a condition being 
true. For example, we often write things such as 


(1) {n? :n EN}, 

(2) {x ty:xE€RAyeRAx?+y?= I}, 

(3) {(@, y): x € RA y = 2}, where “(x, y)” is the “ordered pair” (more on 
which shortly) of the Cartesian coordinates of a point on the plane, 

(4) {(%, y): x € R}. 


We are clear on what we mean by these shorthand notations. First off, for 
example, notation (1) cannot be possibly obtained in any manner by substitution 
from something like {x:... x... }, since the “x” in a set term {x :.4} is bound. 
What we do mean is that we want to collect all objects that have the form “n?” 


for some n in N. That is, notation (1) is shorthand (abbreviation or argot) for 


{x : n\(x =n? AneéeN)} 


Similarly with (2)-(4). (4) is interesting in that y is a free variable, or a para- 
meter as we often say. We get different sets for different “values” of y. The 
shorthand (4) stands for the term {z : (Ax)(z = (x, y) Ax € R)}. 

The notation reviewed here is sufficiently important to motivate the definition 


below. © 


II.8.7 Informal Definition (Collecting Formal Terms). The symbol 


{t[Wm] : 4 nl} 


where t[W,] is a, formal term (cf. discussion following III.2.5), is an abbrevia- 
tion of the class term 


[y : Gxy)x2) +++ Gan)(y = thin] ALI) (1) 


The variables x, explicitly quantified in (1) above are precisely the ones we 
list in “[X,]” of -4. We may call them “linking” variables (linking the term f 
with the “condition” 4) or “active” variables (Levy (1979)). All the remaining 
variables other than y are free (parameters). 

The notation does not always unambiguously indicate the active variables. In 
such cases the context, including surrounding text, will remove any ambiguity. 


cues Example. What does 


Ute) sb} 
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abbreviate? In the first instance, it abbreviates the expression 


Ut: Qnyy =the] a.4[x)} 


The latter abbreviates (cf. III.6.3) 


{z: @y)(H0 =the] A-ZLe) Az € y)} (2) 
Let us simplify (2): 


y)((Ax)Qy = thx] A.4[x]) Az € y) 
o (: € y has no free x) 

(Ay )(Ax)Qy = tle] A.4[x] Az € y) 
o (commuting the two 3) 

(ax )(Ay)Qy = the] A.4[x] Az € y) 
o (one point rule (1.6.2, p. 7) 

(Ax)(4[x] Az € t[x]) 


Thus,‘ 


b tetx] : Alx]} = {z: (Ax)\-4IXI A z € tlx))} (3) 


© 


III.8.9 Proposition. The class {t[x]:x € A} (A a free variable) is a set, that is 
(using III.8.7), 


Fzpc Colly((ax € A)y = t[x]) (4) 


Only x, not A, is the linking variable. We could have written {t[x] : (x € A)[x]} 
to indicate this. © 


Proof. By collection, 
Fzpc (Az)(vx € Ay € z)y = tx] (5) 
since 


(Vx € A)Gy)y = t[x] 


1 Note that since we are dealing with abbreviations, this is a theorem of pure logic. 
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is a theorem of pure logic deduced from ft = ¢ (using substitution, tautological 
implication, and generalization, in that order). Arguing by auxiliary constant, 
we add a new constant B and the assumption 


vx)(x EA> Aye BAy= ix) 
The one point rule and specialization yield 
xEeArt{x]eB (6) 
We can now prove that 
((ax € A)y =t[x]) > ye B (7) 


which will settle (4) by HI.3.6. 
We assume the hypothesis in (7); indeed, we go a step further: We add a new 
constant C and the assumption 


CEeAAy=12[C] 
Thus 
CéeA (8) 
and 
y=t(C] (9) 
(6) and (8) yield t[C] € B. From (9), y € B. 


IIL.8.10 Corollary. zc Coll. (ax Ee A)ze 1x1). 


Proof. Apply IlI.6.8 to {t[x] : x € A} (a set, by III.8.9), and use (3) above 
(in III.8.8). 


II.8.11 Corollary. {t[x, y]:x <A A ye B} is a set, where A and B are free 
variables (and x and y are active). 
Formally, tzpc Coll, (ax € A)(ay € B)z = tLe, yl). 


Proof. We will establish 


HK Ulx, ylixe AA ye B)=(J {ele viv € B}:x € A} (1) 


from which the corollary follows by two applications of III.8.9 followed by an 
application of union. As for (1), we transform the right hand side to the left hand 
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ee 


side by eliminating abbreviations. The “=” instances below use III.4.1(i7): 


U{ttx. vi: Byx ea] 
= (by I11.8.8(3)) 

{z:G@x € Az € (lx, yl:y € B}| 
= (by 11.8.7) 

[z:G@x € A)Gy € B)z = rlx, yl} 
= (by 11.8.7 again) 


{tlx, y]}ix Ee AAye B} 


@ by commutativity of A, (1) yields 
HK Ulx, ylixe Any e B)=() {tele y):x€ Ali e BI © 


III.8.12 Proposition. In the presence of the ZFC axioms that we have intro- 
duced so far — less collection, except when explicitly assumed as (1) below — we 
have the following chain of implications (stated conjunctively): (1) > (2) > 
(3) > (4) — (5), where the statements are 


(1) collection—version 11.8.2, 

(2) (Wx)(z)(VyMAI[x, y] > y € z) > (WA)Coll, (Ax € AVA Ix, y], 

(3) (Wx)(Az)(VyMAIx, yl] > y € z) > (WA)Coll,(Ax € A) [x, yl, 

(4) WxVy Vy )(ALx, VIA A Ix, 1 y=y') > (WA)Colly(Ax € A) P[x, y], 
(5) (Vx € A)(A!Ly)ALx, y] > G€z)(Vx € A)\Ay € JAX, y]. 


Proof. (1) > (2): We assume (1) (collection version III.8.2). To prove (2) 
assume the hypothesis. Hence (specialization) 


(Az\Vy\(ALx, y] > y €z) 


We add a new constant B and let 


(Wy (ALx, y] > y € B) (i) 
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By IIL.3.6,! 
Coll, A [x, y] 


It follows that {y : A[x, y]} can be introduced formally as a set term. Hence, 
by HI.8.10, 


Coll (ax e A)y € {y: Ale, yb) 
In short, Coll,(4x € A)A[x, y]; thus 
(WA)Coll, (Ax € A)P Ix, y] 


(2) — (3): We assume (2). To prove (3), assume the hypothesis. Hence 
(specialization) 


(Az)\Wy(AIx, y] > y € 2) 
We add a new constant B and let 
(Vy (AIx, y] = y € B) 


Tautological implication followed by an invocation of V-monotonicity (1.4.24, 
p. 52) yields (7) above. 
(3) — (4): We assume (3). To prove (4), assume the hypothesis. Hence 


Pix, yJAP[x,y] 7 y=y (ii) 
We also record the tautology 
(Ay) lx, yl > Gy) Als, y] (iii) 


By III.2.4 (p. 122: (10), (11), (18)), @i) and (7) allow us to introduce a new 
function symbol, fy, into the language, and the defining axiom 


(Ay Alix, YAS, fale Vv -Gy)AIx, yA felix] = 9 (iv) 
into the theory. From (iv) one deduces (corresponding to (19’) and (20) of III.2.4) 
(Ay Alix, yl > Ax, yl y= flx)) (v) 

and 


(ay) A [x,y] > frlx] = 8 (vi) 


+ Observe how we did not need to insist that B is a set. This issue was the subject of a footnote on 
p. 164. 
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We can now prove 
(Vx)Az(A Ix, y]< y € 2) (vii) 


We have two cases: 


Case of Ay) Aix, y]. By (v), 
Pix, yl ye {frlx}} 
By the substitution axiom, 
(Az(A Ix, yl = y €z) (vii’) 


Case of ~(Ay)P [x, y]. Thus, (Vy)7>P[x, y]; hence ~P[x, y] is deriv- 
able. By tautological implication, 7[x, y] > y € &. 

Conversely, y € 6 > P[x, y] is a tautological consequence of the theorem 
ay € ¥. Thus, 


Pix,yloyegw 


and we derive (vii’) once more, by the substitution axiom. Proof by cases now 
yields (vii) solely on the hypothesis (Vx)(Vy)(Vy'\(A[x, y] A A Lx, y’] > 
y = y’); hence we have (vii) by generalization. 


Having settled (vii), we next obtain 
(VA)Coll,(Ax € A)P[x, y] 


by our hypothesis (3). 
(4) — (5): We assume (4). To prove (5), assume the hypothesis, that is, 


(Vx € A)(Aly) Ax, y] 
which entails 
(Vx)\(x € A> (Ay) PI[x, y]) (viii) 

and 

xE€ADPl[x,yJAPF[x, y]> yay’ (ix) 
For convenience we let 

Q(x, yJ=xEAAP[x, y]Vx¢AAV=B 

Work already done in III.2.4 yields, because of (ix), 


lx, yIA Clix, y]l> yay’ 
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Thus, by hypothesis (4), we have derived 
(V¥z)Coll, (Ax € C(x, y] 

Hence 

Coll, (Ax € A)ZIx, y] (x) 
by specialization. Expanding @ and using the tautology 

(xe AA@E AAA yIVx EAA Y =D) Ox EAAAL, YI 

the equivalence theorem yields 
x(x e AAG € AAALx, yIVx ¢ AAY =D) Gx) € AAALx. yD) 
Hence, from (x), 

Coll,(Ax € A)P[x, y] 
Let then (a formal definition of “B’’' introduced just for notational convenience) 

B={y:(Ax)\(x €AAP[, y])} (xi) 


Let also w € A. 


By (viii) we get Gy) |[w, y], which allows us to add a new constant C and 
the assumption 


Piw,C] (xii) 
from which w € A A A[w, C] by tautological implication. Therefore 
(Ax) € AA Px, C)) 
and, by (xi), 
CeB (xiii) 
Now (xii) and (xiii) yield C € BA P[w, C); thus 
Ay) ¢ BA P[w, y]) 
The deduction theorem and hypothesis w € A yield 
weA— (Aye B)Y{[u, y] 
+ Of course, “B” is a name for a term, and what one really defines formally here is a function f, 


by f(A,...) = {y : Gx)~ € AA PI[x, y])}, where “...” are all those free variables present 
that we do not care to mention. In practice, “let B be defined as ...” is all one really cares to say. 
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(Vx € A)\Ay € B)P[x, y] 
Thus (substitution axiom) 


(Az)(Wx € A)(Ay € z) PAIX, y] 


which is the conclusion part of (5). 


III.8.13 Corollary. Each of the versions (2)-(5) of collection above is a theo- 
rem schema — hence for each specific F a theorem — of ZFC. 


III.8.14 Remark. (1) Thus, in the sequel we may use any version of collection, 
as convenience dictates. 

(ID) Intuitively, collection versions (2)—(5) have a hypothesis that guarantees 
that for each “value” of x the corresponding number of values of y that satisfy 
F |x, y] is sufficiently “small” to fit into a set. In fact, in case (4) at most one 
y-value is possible for each x-value, while in case (5) exactly one is possible — 
albeit on the restriction that x is varying over a set A. Thus, collecting all the 
values y, for all x ina set A, yields a set under all cases (2)—(5). This set is what 
we call in elementary algebra or discrete mathematics courses the image of A 
under the black box /[x, y]. This black box is an agent that for each “input” 
value x yields zero or more (but not too many) “output” values y. 

We note that collection in its version III.8.2 does not have the “not too many” 
restriction on the number of outputs y for each x, and that is why one is selective 
when collecting such outputs into a set. The conclusion of Axiom IIL.8.2, 


(Az)(Vx € A)(Ay € 2 AIX, y] 


allows the possibility that many outputs y need not be included in z; it says 
only that some outputs are included. 


(II) Collection versions (2)-(4) in III.8.12 are quite strong, even in the 
absence of some of the other axioms. For example, Bourbaki (1966b) adopts the 
axiom of pairing, but adopts collection version (2), and proves both separation 
and union (Exercise IJ.14). 

Shoenfield (1967) adopts separation and proves pairing and union from col- 
lection version (3) (Exercise II.15). Finally, Levy (1979) adopts union, and 
proves separation and pairing from collection version (4) (Exercise III.16). 
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III.9. Axiom of Power Set 


There is another operation on sets, which, intuitively, increases the size of a set 
“exponentially”, from which fact it derives its name (see Exercise III.35 in this 
connection). 


III.9.1 Informal Definition (Power Class, Power Set). For any class A, P(A) 
stands for {x : ~>U(x) A x C A}. We read P(A) as the power class of A. 
If A is a set, then P(A) is also pronounced the power set of A. 


eee that what we collect in P(A) are sets. 


IiI.9.2 Example (Informal). We compute some power classes: 


PA) = (9) 
P({9}) = {9, (H}} 
P ({0, 1}) = {9, {0}, {1}, (0, 1} 


Also note that 


(i) Since for every class A we have @ C A, it follows that @ € P(A). 
(ii) If a is a set, then a C a as well; hence we have a € P(a). 
(iii) For a set x, x € P(a) iff x Ca. 
(iv) Even though U(x) > x Ca (is provable), still U(x) > x ¢ P(a) (is prov- 
able), since x must satisfy ~U (x) for inclusion. Power classes contain no 
atoms. 


Pause. Now P ({9}) =) {9, {O} } by (i) and (ii) above. But have we not forgotten 


to include any other subsets of {}? Is it really “=” (rather than “D’’) as we have 
claimed above? The definitive answer that one is tempted to give is “Obviously, 
the above is ‘=’ as stated”. 


Well, let us prove the obvious, just to be sure: 
We will prove the formula -U(x) A x C {@} > x =@OV x = {G}, that is, 
the tautologically equivalent 


AU(x) > x C {(O} > 7x =H > x = {0} (1) 


+ My geometry teacher in high school used to say: “... [T]here are many proof methods: e.g., by 
contradiction, by induction, etc. Among all those proof methods the most powerful is proof by 
intimidation. It starts with the word ‘obviously’. Few have the courage to challenge such a proof 
or to demand details ...”. 
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Arguing by the deduction theorem, we assume the hypothesis, namely, 
U(x) 
(Vy\y ex > y=9) 
and 
—x = 0 
By (2) and the above (see III.4.11) 
(Ay)y ex 
Let (arguing by auxiliary constant) 
AEX 
Hypothesis (3) yields A € x + A = %. (5) now yields A = J; hence 
Bex 
by the Leibniz axiom and (5). The one point rule (III.6.2, p. 149) gives 
FHExOoWVYQV =O> yeEx) 
Thus, by (6), 
Wy)\y = 98> y ex) 
This along with (3) yields 
Wy)\y exo y=9) 
Thus x = {} (IIL.5.5). 
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(2) 
(3) 


(4) 


(5) 


(6) 


III.9.3 Exercise. Repeat the above argument without relying on the axiom of 


pairing. Thus, prove that 


x €P(PO)) ox =DVx =P) 


eine following is “really true”: 
If A is a set, then so is P(A) 


Indeed, let & be a stage at which A is formed as a set. Let x € P(A), Le., 
x C A. Every member of x is available before x, and hence before A, therefore 


© 
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before stage &. Thus, 
x € P(A) implies that x is formed as a set at or before stage U (1) 


Let &’ be a stage after © (we have no problem accepting that a stage exists 
after a given stage). Then, by (1), P(A) is formed as a set at stage X’. 

We could not reliably argue the above using the size limitation doctrine 
(Principle 3, p. 161), for it is not intuitively clear whether an “exponential 
growth” in set size is harmless. In fact, Principle 3 was introduced solely to 
justify the replacement axiom, and, in Chapter V, the axiom of infinity. © 


We capture the above informal argument by the following power set axiom: 


III.9.4 Axiom (Axiom of Power Set). 
(Ay)Wx)a CA>x ey) (1) 


where A is a free variable. 


Actually, the axiom as stated above is not exactly what we have established in 
the informal argument that preceded it. The axiom says a bit more than “the 
power class of a set is a set”. 

It is really true as stated, nevertheless. First off, the y of (1) above is neces- 
sarily a set by x CA— x € y, since (Ax)x C A is provable (A C A is a theorem 
of pure logic — see III. 1.5, p. 116) and hence, so is (4x)x € y by 3-monotonicity. 
Thus —U(y) by Axiom III.1.3. We have omitted the usual qualification “=U (y)” 
in the statement of the axiom, since it is a conclusion that the axiom forces any- 
way. So the axiom says that 


“There is a set y which contains as elements all x such that x C A, 
without restricting x to be a set”. 


Now we see why it is really true. If A is a set, then lifting the restriction from 
x adds to P(A) all the urelements (see III.1.5, p. 116). Well, there is a set y as 
described above, for example, M U P(A).! 

If on the other hand A is an atom, then a choice for y that works is M U {0}. 


As we have done on previous occasions, here too we prefer not to assert 
explicitly that the objects which our axioms claim to exist are sets (nor do we 
want to unnecessarily restrict our variables to be sets). We prefer to prove this 


+ We are using here the name M — which was earlier introduced formally to denote the set of all 
atoms — to also name the set of all real atoms in the metatheory. 
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as a consequence of the axioms. On one hand this approach is mathematically 
elegant; on the other hand — more importantly — it allows us to state our axioms 
(e.g., power set above, as well as pairing, union, and collection) in a manner 
that does not betray that we allow atoms; this gives us flexibility. Thus we have 
chosen the statement of Axiom III.9.4 to be morphologically identical to the 
statement one would make in the absence of atoms (all variables then “vary 
over” sets). 


III.9.5 Proposition. 
Fzpc Coll,(7U(x) Ax C A) 


where A is a free variable. 


Proof. By (1) of III.9.4 we may assume (B a new constant) 
(Wx)x CA>x€B) 
Hence 
(VWx)(-U(x)Ax CA>x EB) (2) 
by V-monotonicity, since 


Etat (x C A> x € B)> (FUCK) Ax CA>x€B) 


We are done by III.3.6. 


We would like now to introduce the symbol “P” formally, in the interest of 
convenience, along with its informal use. As in the cases of U, M, and J, we 
will take no notational measures to distinguish between the formal and informal 
occurrences of the symbol; we will rely instead on the context. 


III.9.6 Definition (Formal P). We introduce a function symbol, P, of arity 1, 
by the defining axiom 


P(A) = yo (Wx)(HU@) Ax GAoxeE y) () 
or, equivalently 


(Vx)(-U(x) Ax CA <x € P(A)) (2) 
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TII.9.7 Remark. (I) Once again we note that it is redundant to add “-U(y) A” 
in (1) (11.9.6) or “=U (P(A)) A ” in (2). Indeed, 


Fzpe (Wx (AU(X)AxCASXEYV)SAUY)A 
(Vx)(FU(X)AxCASxeEy) 


To see this, note that the < direction is a tautological implication. For the > 
direction we have 


AU(X)AxCAGSXEY 
Thus, since Fzpc -U (0) A @ C A, we obtain 
(Ax)(-U(x) Ax C A) 
from which (4x)x € y by the equivalence theorem. That is (III.1.3), 
sU(y) 
Similarly, (2) of III.9.6 proves 
=U (P(A)) (3) 
(I) A is an arbitrary variable; thus P makes sense on atoms. Indeed, 


Fzrc U(A) > P(A) = {9} 


(see Exercise III.17). 


111.10. Pairing Functions and Products 


We now turn to the ordered pair concept, which will lead to the formalization 
(within axiomatic set theory) of the intuitive concepts of relation and function 
in the next section. We want to invent objects “(a, b)” which are meaningful 
for all sets and atoms a and b and which are mindful of order in that 


(a,b) = (a',b') >a=da Ab=D' (1) 


In particular, (a, a) is supposed to have two objects in it, a first a and a second 
a, So it is not to be confused with {a, a} = {a}. 

Some naive approaches to set theory take (a, b) to be a new kind of object 
whose behaviour, i.e., (1), is “axiomatically accepted” (admittedly this is a 
patently odd thing to do in a non-axiomatic approach). To proceed formally 
within a framework that accepts sets and urelements as the only formal objects, 
we must implement our new object as a set. 
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There are several implementations, the simplest one (due to Kuratowski) 
being {{a}, {a, by}, or the related {a, {a, bj}. 


III.10.1 Proposition. If {a, {a, b}} = {a’, {a’, b’}}, thena=a' andb=bD’. 


Proof. (Presented in a “relaxed” manner, that is, in argot. See also III.5.9, 
p. 148.) By foundation, a ¥ {a, b} (otherwise a € a). Thus, taking the C-half of 
the hypothesis, 


a=a’ and {a,b} = {a',b’} (1) 
or 
a={a',b'} and {a,b}=a' (2) 


From (2) we get a’ €ae€a’, contradicting foundation. Therefore, case (2) is 
untenable. Let us further analyze case (1), which already gives us half of what 
we want, namely, a = a’. 

Thus, 


{a, b} = {a,b} (3) 


If a = b, then the D-part of (3) gives b = b’, and we are done. Otherwise, the 
C-part of (3) gives b = b’, and we are done again. 


oe The pedantic way to derive a = a’ goes like this: We want 

zee {a, {a, b}} = {a’, {a', b} > a =a’ 

Assume the hypothesis 
(VWz(z=aVz={abhoz=avz={d,b}) 
Thus 
d=avda ={abheod =da vd ={da,b} 

and 

a=aVaz={a,b}ea=a va= {a,b} 
Hence (by tautological implication and the axiom x = x) 

a’ =avda = {a,b} (1) 

and 


a=a' va={a',b’} (2) 
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which (jointly) tautologically imply 


(a =adnaz=a)V (ad =ada=H={a,b}) 
V(a' ={a,b}Aa=a')V (a = {a,b} Aa = {a’,b’}) 


(3) 
By foundation, 
aA(a’ =adAa = {a',b’}) 
a(a’ = {a,b}ANa=a') 
and 
a(a’' = {a,b} Aa = {a', b’}) 
which along with (3) tautologically imply 
a=a 
The rest of the above proof of II.10.1 has a straightforward formalization as a 


proof by cases. Oe 


III.10.2 Definition (Pairing Function and Ordered Pair). We introduce a 
new function symbol of arity 2, J, by 


I(x, y) = {x, {x, y}} 


It is customary to denote the term J(x, y) by (x, y). 
We call J(x, y) or (x, y) the ordered pair. We call J the pairing function. 
“The” is dictated by our determination to have just one implementation of 
(ordered) pair, as that is sufficient for the theory.t Indeed, one seldom needs to 
remember how (x, y) is implemented, as the property expressed in III.10.1 is 
all we normally need and use. 


Many of the sequel’s proofs are in the “relaxed” style. We get to be formal 
whenever there is danger of missing fine points in this argot. © 


III.10.3 Proposition. For anya, b, a’, c', (a, b) = (a’, b')iffa = a' andb = b’. 


Proof. The only-if part is Proposition III.10.1. The if part follows from the 
Leibniz axiom. 


+ An exception occurs in Chapter VII in our study of cardinality, where yet another pairing is 
considered. 


11.10. Pairing Functions and Products 185 


Some will say that using the above definition for “pair” is overkill, since founda- 
tion was needed to establish its key property in II.10.1. By contrast, a definition 
of ordered pair via {{ a}, {a, b} } does not require this axiom (see Exercise II.36). 
This is a valid criticism for a development of set theory that is constantly pre- 
occupied with the question of what theorem needs what axioms. In the context 
of our plan, it is a minor quibble, since we will seldom ask such questions, and 
we do have foundation anyway. © 


We often find it convenient to extend the notion of an ordered pair to that of 
an (ordered) n-tuple in general (for n > 1). To this end, 


III.10.4 Definition (The Ordered n-Tuple). We define by induction (recur- 
sion) onn > 1 a function, J, of arity n: 


def i 
I(x) =x Basis 
def 
JOG ses 25 aed) IIS Bly oa He ae) forn > 0 


where x, X1,...,%n+41 are variables, and J is the pairing function of Defini- 
tion III.10.2. 

It is normal practice to denote the term J (x1, ..., X,) —somewhat ambigu- 
ously, since the same symbol is good for any arity' — by the symbol 


(Xi pass ¥n) 


We adopt this practice henceforth and call (x;,..., x,) an n-tuple, or n-vector, 
or just vector if n is understood or unimportant. We often use the shorthand 


notation (X,) (or (x) if n is not important) for the n-tuple. 


1.10.5 Remark. (1) Some authors will not define n-tuples for arbitrary n — 
once again avoiding the set N and inductive definitions — instead, they will 
“unwind” the recursion and give a definition that goes, say, up to a 5-tuple, e.g., 


IOx) =x 
IPR, Y= IGMP), VY 
IM; 9, 2) = ITP RW), D 


and leave the rest up to the imagination, invoking “...”. The reader should not 
forget that we are using n in the metalanguage. As far as the formal system is 


1 A “real-life” function of non-fixed arity is the print function of computer programming. 
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concerned, “n” of (x,) is hidden in the name — it is not a variable accessible to 
the formal system. 

(2) Following the definition, let us compute (a, b) using the shorthand (a, b) 
for J(a, b) below: 


(a,b) = ({a), b) _ by the induction step 
= (a,b) by the basis step 


Thus, from now on we denote the ordered pair by the symbol “(a, b)”, rather 
than “(a, b)’, in the interest of notational uniformity. 

(3) The ordered pair provides, intuitively, a pairing function — which moti- 
vates the name we gave J — that, for any two objects a and b, given in that order, 
“codes” them into a unique object c ( = (a, b)) in such a way that if, conversely, 
we know that a given c is a “code”, then we can uniquely (by III.10.3) “decode” 
it into a and b. That is, if c is an ordered pair, then (4!x)(4!y)(x, y) = c holds. 
The unique x is called the first projection and the unique y is called the second 
projection of c — in symbols (we are using notation due to Moschovakis), (c) 
and d(c) respectively.i More accurately, since not all sets (and no urelements) 
are valid “codes” (that is pairs; e.g., {0} is not), we must let 2 and 6 “return” 
some standard “output” when the input c is “bad” (not a pair). © 


Let us do this formally: We record the tautology 
(Ax)(Ay)(x, y) = 2 > x)Ay)(x, y) =z (1) 
We next prove 
yx, y) =zA Gy), y) =Z>x =v (2) 


Assume the hypothesis, and add the assumptions (by auxiliary constant) 


(x, A) =z 
and 
(v, B) =z 
Hence 
(x, A) = (vu, B) 


7 Presumably, z for “zp@tn” (= first) and 6 for “Sevtepn” (= second). 

¥ Once again we will ensure that the defined function symbols, z and 4, have total interpretations. 
This determination was also at play when we defined power set and, earlier on, the formal “N’, 
“UW”, and difference. We defined all these functions to act on all sets or atoms. 
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Therefore 


by HI.10.3. 


III.10.6 Definition (The First Projection 7). By the techniques found in 
III.2.4 (p. 122: (10), (11), (18)) we may now introduce a new function symbol 
x of arity 1 by the axiom (we insert redundant brackets to avoid any misunder- 
standing) 


((@n@yX(x.9) =2) A Gy) (7@).¥) =2) V 
((-@n@y (x,y) =2) 2) =9) 


By III.2.4 (19), (20), and (19’) one directly obtains in ZFC (using III.10.6): 


III.10.7 Proposition. 


(Ax)(Gy)(x, y) =z > Gy)(r), y) =z (-19) 
A(Ax)Gy)(x,y) =z > rz) =9 (7-20) 

and 
@xN@y)(x,y)=2> (Gx y=rex=2@) 1) 


A similar analysis, which we do not repeat, yields for 6: 


III.10.8 Definition (The Second Projection 5). By the techniques in III.2.4 
(p. 122: (10), (11), (18)) we may now introduce a new function symbol 6 of 
arity 1 by the axiom 
((@H@y)(x, y) =2) Ax), 8) =z) V 
((-G)@y\(x, ») =z) 15@)=9) 


III.10.9 Proposition. The following are theorems in the presence of the defining 
axiom III.10.8: 


(Ax\Gy)(x, y) = 2 > (Ax)(x, 6(Z)) = z (5-19) 
=(4x)(Ay)(x, y) =z > d(z) =0 (5-20) 
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and 


@xN@yx, y) =z > (GN, y) =2 + y= 30) (5-19) 


It is also notationally convenient to introduce the predicate “is an ordered 
pair” by: 


III.10.10 Definition. We introduce “OP”, a predicate of arity 1, by 
OP(z) <> (Ax )(Ay){x, y) =z 


We pronounce OP(z) as “‘z is an ordered pair’. 


We have at once: 
II.10.11 Proposition. The following are ZFC theorems (in the presence of the 
appropriate defining axioms): 
OP(z) > (1(Z), 6(Z)) =z () 
and 
AOP(z) > 1(z) = HA S(z) = (2) 
Proof. (2) is a direct consequence of II.10.7 and II.10.9 ((7-20) and (6-20)). 
As for (1), assume the hypothesis, OP(z). By III.10.7 (z-19), 
(y){x(z), y) =z 
while by III.10.9 (6-19), 
(Ax)(x, 6(z)) = z 
The above two allow us to assume (where A and B are new constants) 
(1(z), A) = Z 
and 
(B, 5(z)) =z (3) 
Hence 


((z), A) = (B, 6(Z)) 


from which A = 6(z) and B = z(z) by III.10.3. Thus, (7(z), 5(z)) = z. 


It is also worth recording that 
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W.10.12 Corollary. The following are ZFC theorems (in the presence of the 
appropriate defining axioms): 


ACC betes (3) 


and 
8((x, y)) =y (4) 


Proof. We note the logical theorem (x, y) = (x, y), from which the substitution 
axiom yields (Au)(Av)(u, v) = (x, y), that is, 


OP((x. y)) (5) 
For (3), [1.10.7 (z7-19’) yields (note dummy renaming) 
OP((x, y)) > (Guu) = yy ox=r(xy)) © 
By (5) and the logical theorem (Au)(x, uw) = (x, y), (6) yields 


x = 7((x, y)) 


The case for 6 is similar. 


In recursion theory (or computability, studied in volume 1), pairing functions 
on the natural numbers play an important role. There are several so-called 
primitive recursive pairing functions, e.g., 2°3", 2*(2y + 1), 2*+9+? + 2+), 
(x+ y? +x,(x+y)(*+y+1)/2+-~. Of these, only the last one ensures that 
every n € N isa “pair”, while the second one misses only the number 0. 


111.10.13 Proposition. For n > 2 and any objects a,,..., Gn, (Gn) is a Set. 


Proof. Exercise II1.38. 


III.10.14 Proposition. For n > 1 and any objects ay,..., An, b1,..-, 0p, 
(Gn) = (Bn) iff a, =b; fori =1,...,n 


Proof. Exercise I1.39. 


III.10.15 Informal Definition (The Cartesian Product of Two Classes). For 
any classes A and B, the symbol A x B, read the Cartesian product of A and 
(in that order), is an abbreviation for the class term {(a, b): ae AAb € B} 
(see also III.8.7). 
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11.10.16 Lemma. {(x, y):.7 [x, y]} ={z: OP(z) A..F [x(z), 6(z)]} is a theo- 


rem schema. 


Proof. By III.8.7 the left hand side abbreviates the class term 
{z: (Ax\Gy)(x, y) =z AZ [x, y)} 
Thus we need to prove 
Ax)\Ay)C(x, y) =zA FZ [x, y]) + OP@) AF [x(), 6(2)] 
We note the theorem 
(x,y) =z <> OPZ) ATG) =x ASZ)=y 
Indeed, the — direction is by the Leibniz axiom and 


OP(U(x, yA MUX, y)) =x AdCKx,y))=y 


from the definition of OP and III.10.12. The < direction is by HI.10.11. 


() 


(2) 


Now (2) yields, via the equivalence theorem, the first equivalence of the 


following “calculation”: 


(Ax)(Ay)(x, y) = zA.7 Ix, y]) 
<° (see above) 

(Ax)Ay)OP(z) A m(z) = x A 8(z) = yA.Z Ix, y]) 
o (no free x, y in OP(z) 
OP(z) A (Ax)Ay)(r(z) = x A 8(z) = yA.Z Ix, y]) 
o (one point rule) 

OP(z) A (Ax)(a(z) = x A.F [x, 8(z))) 
o (one point rule) 


OP(z) A.Z [x(z), 6(z)]) 


We have proved (1). 


1.10.17 Theorem. For any variables A and B, A x B is a set. 


Proof, By IL8.11 (p. 172), zr Coll.((Ax)y\(z = (x, y)Ax € AAy € B)) 


for any free variables A and B. 
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Using the recently introduced symbols and III.10.16, 
tzpc Coll-(OP(z) A m(z) € A A 8(z) € B) 


Thus, we might as well introduce the formal “x”: 


III.10.18 Definition (Formal x). In view of the above observation, we intro- 
duce a new function symbol, x, of arity 2 by the defining axiom 


AxB=yor7U(y)A (2) € y & OP(z) Am(z) € AA 5(z) € B) 


or 


Ax B={z: OP(z)Am(z) € AA 8(z) € B) 


This A x B makes sense for all variables A and B. In particular, one can 
prove 


U(A) > Ax B=G 
as well as 


Ox B= 


T1.10.19 Remark. The following proof of III.10.17 for any sets A and B is 
often criticized as overkill (e.g., Barwise (1975)), while Bourbaki (1966b), Levy 
(1979), and Shoenfield (1967) — who use the collection-based proof above — 
but not Jech (1978b), just stay away from it without comment: 


Let (a,b) € A x B,ie.,a € Aandb ¢€ B. Thus, {a,b} C AUB, and 
therefore {a, b} € P(A U B). It follows that {a, {a, b}} C AU P(A U B) and 
hence 


{a, {a, b}} € P(AU P(AU B)) 


Thus (a,b) € P (A UP(A U B)), establishing A x B CP (A UP(A U B)). By 
separation, A x B is a set. 

An additional criticism here may be that this proof needed to know the 
implementation of “(a, b)”, while the one, based on collection, does not need 
this information. 

The objection that the proof is overkill, on the other hand, is context- 
dependent. If the foundation of set theory is going to exclude the power set 
axiom (one important set theory so restricted is that of Kripke and Platek with 
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or without urelements — “KP” or “KPU”; e.g., Barwise (1975)), then the objec- 
tion is justified. If on the other hand we do have the power set axiom, then we 
certainly are going to take advantage of it, and we reserve the right to use any 
axiom we please in our proofs. © 


IiI.10.20 Example (Informal). {0} x {1} = {(0, 1)} and {1} x {0} = {(1, 0)}. 
Since (0, 1) 4 (1, 0), these two products are different; hence Ax BABx A 
in general. 


We conclude this section by extending (more argot) x to any finite number 
of class “operands”, just as we did for U and M in II.4.15 (p. 142). 


11.10.21 Informal Definition. Given classes A; fori = 1,...,n, 
xX A; and xX = A; and XX A; 
i=l l<i<n 


are alternative abbreviations for 
{(Xn) 2x1 EC AL A+++ AX € An} (x) 


We avoid “:--” by the inductive definition 


1 

X A; stands for Ay 

i=l 

n+1 n 

xX A; stands for ( x Ai) x Anti 

i=l i=l 

One often writes A, x --- x A, rather than X - Aj. 
If all the A; are equal, say to A, then we will usually write A”. We let A! 

mean A. 


That the “...” notation, in («) of TI.10.21 above, and the inductive definition 
coincide can be verified using III.10.4. 

We can, of course, use the metalogical “=” in lieu of “stands for’. For exam- 
ple, the logical theorem A, = A, leads to the logical theorem X asi A; = A, 


1 
on replacing the left A, by the “abbreviation” >< ,_, Aj. © 


II.10.22 Informal Definition. We often have a “rule” which for each a ¢€ I 
“gives” us a class Ay. This simply means that for some formula .4(x, y) (the 
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“rule”) we consider the classes {x : .4(a, x)} for each a € I. I is the index 
class. 

We cannot in general collect all the A, into a class, yet we can (informally) 
“define” their union and intersection: 


U A, stands for {x : Ga € I)x € Aj} that is,{x : Ga € I). A(a, x)} 
ael 
and 


() A, stands for {x : (Va € Ix € Ag} that is,{x : (Va € I). 4(a, x)} 


ael 


The case I = N occurs frequently in informal discussions. 


III.10.23 Proposition. [f for n > 2 all of A,,...,An are sets, then so is 
Ay X+++X Ap. 


Proof. By IIL.10.21 and induction on n. 


IiI.10.24 Corollary. For any set A, A” isa set forn > 1. 


III.11. Relations and Functions 


We intuitively picture a binary relation as a table of rows, each row containing 
two objects, a first object (occupying the first column) and a second object 
(occupying the second column) — see also Section I.2, p. 20. 

A table naturally leads to a (usually) one-to-many “rule” that to each object 
from some class associates one or more‘ elements from another (possibly the 
same) class: Simply associate to the the first object on each row (the input 
object) the second object of the row (the output object). 

Conversely any one-to-many “rule”, regardless of how it is expressed (re- 
gardless of intention) can be represented as a table by forming a class of rows 
where the second object in each row is associated to the first, according to the 
given rule. 


— 


“Consider”, not “collect”, since some A, may be proper classes and we are unwilling to collect 
other than sets or atoms into a class. 

This is a “theorem schema”. For each value of the informal object n we have a (different) theorem: 
“A x Bisaset’; “A x B x C isa set”; etc. We have suggested above a (meta)proof of all these 
theorems at once by induction in the metatheory. 

Such a table may have, intuitively, infinite length. 

Hence the term “one-to-many”. 


a 


= wm 
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Our rigorous counterpart of a rule or table is then a class of ordered pairs.’ 
Such a class is what we call a binary relation, or just relation. 


II.11.1 Informal Definition (Binary Relations). A binary relation, or just 
relation, is a class T whose members are (exclusively) ordered pairs. 

Within ZFC, that ““T is arelation” is argot for “z € T > OP(z)isatheorem”. 
Similarly, “let T be a relation” means “add the assumption z € T > OP(z)”. 


If T is a relation, then the notations (a,b) € T and bT a (note the order 
reversal in the notation) mean exactly the same thing. 


A relation T often is introduced as a class term {(x, y) :.7 [x, y]} or, equiv- 
alently (III.10.16), {z : OP(z) A.7 [x(z), 6(z)]}. 

We call.7 the defining formula of the relation T. In most practical cases .7 
has no parameters, that is, x and y are its only free variables. In such cases we 
say that T is the relational implementation of .7 (x, y) or that it is the relational 


extension of 7 (x, y). 


WiI.11.2 Remark. (1) The term “binary”, understood if omitted, refers to the 
fact that we have a class of (ordered) pairs in mind. 

Some mathematicians — especially in the context of a discrete mathematics 
course — want to have n-ary relations, for any n > 1, that is, classes whose 
members are n-tuples (III. 10.4, p. 185). We will not spend any nontrivial amount 
of time on those, since for any n > 2, (X,) = ((Xn—1), Xn) (by T1.10.4), and 
therefore any n-ary relation, for n > 2, is a binary relation. Forn = 1 we 
have the unary relations, that is, classes of elements that are 1-tuples, (x). 
Since (x) = x (cf. II.10.4), unary relations are just classes with no additional 
requirements imposed on their elements. We will not use the terminology “unary 
relations”; rather we will just call them classes (or sets, as the case may be). For 
the record, when one uses n-ary relations, one usually abbreviates “(x,) € T” 
by “T(x,)”. In particular, when n = 2, the texts “(x, y) € T”, “T(x, y)” and 
“y T x” state the same thing. 

An n-ary relation T may naturally arise as the extension or implementation 
of a formula of n free variables, that is, as T = {(x,) : 7 (X,)} (see III.8.7). In 
this case, and in view of the above comment, the texts “T(x,)” and “7 (x,)” 
are interchangeable in the argot of n-ary relations. 


(2) The empty set is obviously a relation. 


(3) Note the reversal of order in “(a, b) € T iff b Ta” in If.11.1. This is 
one of a variety of tricks employed in the literature in order to make notation 


+ Much is to be gained in notational convenience if we do not restrict relations to be sets. 
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regarding composition consistent between relations and functions (we will re- 
turn to clarify this point when composition is introduced). The trick employed 
here is as in Shoenfield (1978); for a different one see Levy (1979). 


(4) In the same spirit, whenever the defining formula .¥ of a relation F is 
by convention written in so-called infix notation (i.e., x.¥ y) rather than prefix 
(i.e.,. A(x, y))—e.g., one writes x < y rather than <(x, y) — then we observe 
this reversal by writing F = {(y, x): x.F¥ y} or F = {z: OP(z) A d(z)F 1(z)}. 
This notation has the nice side effect that a Fb <— a.¥ bis provable. For exam- 
ple, {(y, x) : x < y)} is the relational implementation of the formula x < y. 


Note. We will continue writing F = {(x, y) : F(x, y)}, that is, there is no 
reversal of variables if the defining formula .Y is written in the usual prefix 
notation. 


Whenever b Ta holds, we say that T, when presented with input a, “re- 
sponds” with b among its (possibly many different) outputs. 


(5) A relation often inherits the name of the defining formula. Thus the 
relation {(y, x) : x € y} is also denoted by “e”, and (y, x) € © means x € y. 
The left “e” in “(y,x) € €” is the nonlogical symbol, while the right “ce” is 
the informal name of the relation that extends the formula x € y. Similarly, < 
is used as both the name of {(y, x) : x < y} and that of the defining formula; 
thus (y,x) € < means x < y. 

With some practice, all this will be less confusing than at first sight. 


I1.11.3 Example (Informal). Here are some relations: , {(0, 1)}, R? (where 
R is the set of all reals), {(0, 1, 2)}. According to the above remark, the last 
example is both 3-ary (ternary) and binary, since (0, 1, 2) = ((0, 1), 2). 


II.11.4 Informal Definition. Let S be any class (binary relation or not). 

dom(S), its domain, is an abbreviation for the class {x : (Ay)(x, y) € S} 
or {(z) : OP(z) Az € S}, Le., the class of all “useful” inputs — those 
which do “cause” some output in S. The range of S, ran(S), on the other hand 
stands for the class of all the outputs “‘caused” by all inputs in S. In symbols, 
{y : (Ax)(x, y) € S} or, equivalently, {5(z) : OP(z) A z € S}. 

The argot concepts “dom” and “ran” apply, in particular, to relations S. 

The class that contains all the useful inputs and all the outputs is the field of 
the class S, that is, dom(S) U ran(S). 

IfS ¢ A x B for some A and B, then we say that “‘S is a relation on A x 
or that “S is a relation from A to B” or that “S is a relation that maps A into 


” 
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B”. The symboli S : A > B is read exactly like any one of the three previous 
italicized sentences in quotes. 


A is called the left field and B is called the right field of S. 
IfS C A x A, then we say that S is a relation on A rather than on A x A. 


Given S : A > Band in the context of these fields, S is total iff dom(S) = A; 
otherwise it is nontotal. 


It is onto iff ran(S) = B. In this case we often say that S maps A onto B, or 
just that S is onto 


The converse or inverse of any class S, in symbols S~!, is the class {(x, y) : 
(y, x) € S}. 
The concept of inverse (converse) applies, in particular, to relations S. 


LetS: A > B, X C A, and Y C B. The image of X under S, in symbols 
SEX], is the class of all the outputs that are caused by inputs in X, ie., {y : 
(ax € X)ySx}. 


We have the non-standard! shorthand S(c) for S[{c}]. 


The inverse image of Y under S is just S~'[Y]. 
We have the non-standard shorthand S~!(c) for S~![{c}]. 


T1I.11.5 Remark. (1) The notions of left field and right field are not absolute; 
they depend on the context. It is clear that once left and right fields are chosen, 
then any super-class of the left (respectively, the right) field is also a left (re- 
spectively, right) field. Conversely, one can always narrow the left field until it 
equals dom(S), thus rendering S total. A similar comment holds for the concept 
of onto. That may create the impression that the notions “total”, “nontotal’”’, and 
“onto” are really useless. 

This is not so, for in many branches of mathematics the studied relations and 
functions have “natural” associated classes (usually sets) from which inputs are 
taken and into which outputs are placed. For example, in (ordinary) recursion 
theory functions take inputs from N and produce outputs in N. It is a (provably) 
unsolvable problem of that theory to determine for any given such function, in 
general, whether it is total or onto.’ Therefore it is out of the question to make‘ 
left and right fields “small enough” to render the arbitrary such function total 
and onto. 


1 The context will not allow confusion between the logical — and the one employed, as is the case 
here, to mean “to”. 

= A reason for its being non-standard becomes obvious as soon as we consider III.11.14. 

8 By Rice’s theorem, proved in volume 1. 

1 “Make” with the tools of recursion theory, that is. Such tools are formalized algorithms. 
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(2) Note that for any a and b, b € S(a) iff b € S[{a}] iff Gx € {a})bSx 
iff (Ax)(x = a AbS x) iff bSa. This pedantic (conjunctional) iff chain proves 
(within pure logic, where we wrote the argot “iff” for “<>”’) the obvious 


be S(a) < bSa (i) 
Similarly, 
beS\(a)obS'a (ii) 


since S! is a relation. 
(3) The definition (III.11.4) of inverse relation is equivalent to 


bSa iff aS 'b 
Using (i) and (ii), we obtain at once 
tbe S(a) oaeS '(b) 
(4) 
F S[X] = J{Stx) : x € X} 


as the following calculation shows. 


UlSts) x eX} = [z: @n@ eX az Sey)} (by I1L.8.8) 
2 {2 (Axx eXAz Sx)} (by (i) above) 
= S[X] (Definition II1.11.4) 

(5) 


FS lfY] = {x : YNS(x) 48) 
as the following calculation shows: 


S“[Y] = {x : Gy € Y)xS7!y} 
= {x: Gy)(y e YA ySx)} 
={x: Gy)y € YA y € S(x))} 
= {x: Gy) € YN S(x))} 
= {x : YNS(x) 4 GB} © 


111.11.6 Example (Informal). Let S = {(0,a), (0,5), (1,c), ({0, 1}, a)}. 
Then S(0) = S[{0}] = {a,b}, S[{0,1}] = {a,b,c}. On the other hand, 
S({0, 1}) = SL{{O, 1}}] = {a}. Thus, S[{O, 1}] A S({0, 1}). 
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This phenomenon occurs because dom(S}) has a member, namely {0, 1}, 
which is also a subset of dom(S’). One encounters a lot of sets like this in set 
theory, so the common notation “S(CX)’, which is used in naive approaches both 
for the image (when X is viewed as a collection of points) and for the output(s) 
when X is a single input (X now being viewed as a point), would have been 
ambiguous in our setting. 


What does S(a) = @ mean? By III.4.11 it translates into ~(Ax)x € S(a). 
This is (logically) equivalent toa ¢ {z : (Ax)x Sz}, that is, a ¢ dom(S), or S(a) 
is undefined. © 


III.11.7 Example (Informal). (1) Let < and > be the usual predicates on N, 
and let us use the same symbols for the relational extensions of the atomic 
formulas x < y and x > y. Then, <(3) = {x : x < 3} = {0, 1, 2}. Similarly, 
>(3) = {4,5,6,...}. 


(2) Let M = {(0, x) : x = x}. Then dom(M) = {0} and ran(M) = Uy. 
Thus a relation that is a proper class can have a domain which is a set. Similar 
comment for the range (think of M7'). 


III.11.8 Proposition. [f the relation S is a set, then so are dom(S) and ran(S). 


Proof. Assume the hypothesis. By 
z€ S— OP(z) Frat 2 € S <> z€ SA OP(Z) 


and III.11.4 we get dom(S) = {z(z) : z € S}. The claim follows now from 
TII.8.9. 


The argument for ran(S) just uses 6(z) instead. 


II.11.9 Informal Definition. For any relation S, “a € dom(S)” is pronounced 
“S(a) is defined”. We use the symbol “S(a) |,” to indicate this. Correspondingly, 
“a € dom(S)” is pronounced “‘S(a) is undefined”. We use the symbol “S(a) +” 
to indicate this. 

If T is arelation and S C T, then T is an extension of S, and S is a restriction 
of T. 

If T is a relation and A is some class, then a restriction of T on A is usually 
obtained in one of two ways: 


(1) Restrict both inputs and outputs to be in A, to obtain TN A?. The symbol 
T | A is used as a shorthand for this restriction. 

(2) Restrict only the inputs to be in A, to obtain {x € T : (x) € A}. This restric- 
tion is denoted by T | A. 
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IiI.11.10 Example (Informal). Suppose that we are working in N. Thus 
{(2, 1), (2,0), (1,0)} =< 1 {0, 1, 2}*. It is also the case that {(2, 1), (2, 0), 
(1, 0)} =<} {0, 1, 2}, so that both versions of restricting the relation < on the 
set {0, 1, 2} give the same result. 

Now, {(1, 2), (0, 2), (0, 1)} = > M{O, 1, 2}?. However, 


>} {0, 1, 2} = {(0, 1), (0, 2),..., (1,2), (1, 3)..., (2, 3), (2, 4)...}. 


Here the two versions of restriction are not the same. 


11I.11.11 Remark. (1) By the concluding remarks in [I.11.6, S(a) | iff 
S(a) 4G, while S(a) + iff S(a) = 0. 

(2) For relations T the most commonly used version of restriction is T | A. 
Occasionally one sees T | A for T | A (Levy (1979)). 

(3) A relation S : A — B is sometimes called a partial multiple-valued 
function from A to B. “Partial” refers to the possibility of being nontotal, while 
“multiple-valued” refers to the fact that, in general, S can give several outputs 
for a given input. 

Remove this possibility, and you get a (partial) function. 


IiI.11.12 Informal Definition ((Informal) Functions). A function F is a 
single-valued relation, more precisely, single-valued in the second projection. 

If we are working in ZFC, then “F is single-valued in the second projection” 
is argot for 


“x eFAy €FAx(x) =2(y) > 8(x) = Sy)” (1) 


Thus, “if F is a function, ...” adds (1) to the axioms (of ZFC), while “... F is 
a function ...” claims that (1) is a theorem. 

If F : A > B, then F is a partial function from A to B. The qualification 
“partial” will always be understood (see above remark), and therefore will not 
be mentioned again. 


11.11.13 Remark. (1) The definition of single-valuedness can also be stated 
asbFaAcFa — b=c (wherea, b, c are free), or even (a, b) € FA (a,c) € 
Fo b=c. 


(2) Clearly, if F is a function,’ then we can prove z € F > F(z(z)) = {8(z)} 
(we have > by definition of F(x) and C by single-valuedness). 


(3) @ is a function. 


+ We are not going to continue reminding the reader that this is argot. See III.11.12. 


© 
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(4) Since a relation is the “implementation” of a formula as a class, so is a 
function. But if the relation F, defined from the formula. (x, y), is a function — 
that is, we have the abbreviation 


F = {{x, y): FQ, y)} () 
and also a proof of 
(x,y) €FA(x,z)€ Fr y=z (ii) 


— then we must also be able to prove 
Fx, yWANFX D> Y=Z (iii) 


Indeed, we are (and a bit more). 

First off, we see at once — by (slightly ab)using III.4.1(ii7)* (p. 134) — that 
“(x, y) € F” abbreviates .F (x, y). 

A more serious (i.e., complete) reason is this: “(x, y) € F” is logically 
equivalent to 


(au)€w)((u, v) = (x, y) A. Flu, w)) 


by II.8.7. By III.10.3 and the equivalence theorem, the above translates (is 
provably equivalent) to 


(Auj\aw)(u=x Av=yA Fu, w)) 
Two applications of the one point rule yield the logically equivalent formula 
F(x, y). 
With this settled, we see that (i7) and (iii) are indeed provably equivalent if 
F is given by (7). 
(5) Since a function is a relation, all the notions and notation defined previ- 


ously for relations apply to functions as well. We have some additional notation 
and concepts peculiar to functions: 


IiI.11.14 Informal Definition. If Fis a function andb € F(a), then (by unique- 
ness of output) {b} = F(a). For functions only we employ the abbreviation 
b = F(a). Note the round brackets. 

If a = (X,) we agree to write F(x, ) rather than F((x,)). 

Functions that are sets will be generically denoted — unless they have specific 
names — by the letters f, g, h. 


t This informal definition gives the meaning of z € {x :.4[x]}, not that of z € {t[x] :.4[x]}. 
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We now see why we have two different notations for functions and relations 
when it comes to the image of an input. F(a) is the output itself, while F(a) is 


the singleton {F(a)}.' © 


11.11.15 Example (Informal). Here are two examples of relations from “real” 
mathematics: C = {(x, y) € R?:x?+ y?=1} and H={(x, y)€R?:x7+ y= 
1Ax>O0Ay= O}. 

H, but not C, is a function. Clearly H is the restriction of C on the non- 
negative reals, Ro, in the sense H = CN R2y. 


III.11.16 Informal Definition (Function Substitution). If S is a relation, 
F(X, Y1,--+, Yn) is a formula, and G is a function, then 


F (G(x), V1,--.;¥n) abbreviates (Az)(z = G(x) A.F (Zz, y1,---, Yn)) 


In particular, G(x)Sa stands for (Az)(z=G(x)AzSa), and aS G(x) for 
(Az)(z = G(x) Aa Sz). 

In short, we have introduced abbreviations that extend the one point rule in 
the informal domain, with informal terms such as G(x) that are not necessarily 
admissible in the formal theory. 


111.11.17 Remark (Informal). (1) Take the relations “=” (© {(x, y) 1x = y}) 


and “4” (= {(x, y) : 3x = y}), both on N, and a function f : N > N. Then 


YAF@) iff Goce FyrAz= f(x) (i) 


by III.11.16. Call this relation F. Also, let T © {(x, y) sy = f(x}. 
Now N? — T = {(x, y) : f(x) t Vz # yAz = f(x))}, for there are 
two ways to make y = f(x) fail: 


(a) f(x) t, since y = f(x) implies f(x) J, or 
(b) G2z FyAz= f(x). 


Thus, unless f is total (in which case f(x) * is false for all x), N? — T # F. 
This observation is very important if one works with nontotal functions a lot 
(e.g., in recursion theory). 


+ Sometimes one chooses to abuse notation and use “F(a)” for both the singleton (thinking of F 
as a relation) and the “raw output” (thinking of F as a function). Of course the two uses of the 
notation are inconsistent, especially in the presence of foundation, and the context is being asked 
to do an unreasonable amount of fending against this. The F(x) notation that we chose restores 
tranquillity. 
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(2) According to Definition III.11.16, for any functions F and G, 
F(a) = G(d) means (Ax)(Ay)(x = F(a) A y= G(b) A x = y), or more simply, 
(Ax)(x = F(a) A x = G(d)). This is satisfactory for most purposes, but note that 
“=” between partial functions is not reflexive! Indeed, if both F(a) and G(b) 
are undefined, then they are not equal, although you would prefer them to be. 

Kleene fixed this for the purposes of recursion theory with his weak equality, 
“~”?, defined (informally) by 


F(a) ~ G(b) abbreviates F(a) t A G(b) t V(ax)\(x = F(a) Ax = G(b)) 
Clearly, + F(a) + A G(b) t+ F(a) = G(b).1 


“ee 


Whenever we use we mean ordinary equality (where, in particular, 
F(a) = G(b) entails F(a) | and G(b) |). On those occasions where weak equal- 
ity is employed, we will use the symbol “~”. © 


TI.11.18 Exercise. For any relations S and T prove 


GQ) SCT  (Vx)S(x) € T(x), 
(2) S=T < (¥x)S(x) = T(x)). 


Also prove that for any two functions F and G, 


3) S=T  (Vx)S@) ~ T@)) 


while 


(4) S=T < (Vx)(SQ) = T(x)) fails. 


11.11.19 Exercise. For any (formal) term t(x,,) and class term A, the class term 
F = {(%n, t0%n)) + Xn) € A} 
is the binary relation (see III.8.7) 
F= {z : OP(z) A (Axi)... Gxn)(7@) = (%) A(z) = t%,) A(z) € a)| 


Prove that F is single-valued in the second projection (6(z)), and hence is a 
function. 


III.11.20 Informal Definition (A-Notation). We use a variety of notations to 
indicate the dependence of the function F of the preceding exercise on the 
“defining term” f, usually letting A be understood from the context. We may 


¥ Indeed not just “+”, but “F-Taut”. On the right hand side of “—>” we expand the abbreviation into 
F(a) tA G(b) t V@x)&@ = F(a) A x = G(b)). 
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write any of the following: 


(1) F(&,) =t(x1,..., X,) forall x, (recall that we write F(x, ) rather than F((x,)) 
and that (X,, y) = ((Xn), y)). 

Q) F = n> t(%1,.-+5%n)). 

(3) F = AX,.t(x1,..., X,) (A-notation). 


1.11.21 Example (Informal). If we work in N informally, we can define — 
from the term y” — a function 


f = (ay), 9°): y) EN} (1) 

We can then write 
f =dxy.y? (2) 
This function has two inputs. One, x, is ignored when the output is “computed”. 


Such variables (inputs) are sometimes called “dummy variables”. 


A-notation gives us the list of variables (between A and “.’) and the “rule” 
for finding the output (after the “.”). The left and right fields (here N? and N 
respectively) must by understood from the context. 

In practice one omits the largely ceremonial part of introducing (1) and 
writes (2) at once. 


Some restricted types of functions are important. 


II.11.22 Informal Definition. A function F is one-to-one, or simply 1-1, iff 
it is single-valued in the first projection, that is, 


z€FAweEFA 6(z) = dw) > z(z) = 2(w) (1) 
Alternatively, we may write 


Fax) =FQ)>x=y (2) 


A 1-1 function is also called injective or an injection. 


As we feel obliged after the introduction of new argot to issue the usual clar- 
ifications, we state: When used in ZFC, “F is 1-1” is just short for (1) or (2) 
above. Thus, to assume that F is 1-1 is tantamount to adding (1) (equivalently, 
(2)) to the axioms, while to claim that F is 1-1 is the same as asserting its (ZFC) 
provability. 
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Note that F(x)=F(y) implies that both sides of “=” are defined 
(cf. I1.11.16). In the opposite situation F(x) =F(y) is refutable; hence (2) 
still holds. 

The above definition can also be statedasu Fx Au F y > x = y. Wecan say 
that a 1-1 function “distinguishes inputs”, in that distinct inputs of its domain 
are mapped into distinct outputs. 

Note that f = {(0, 1), (1, 2)} is 1-1 by I0.11.22, but while f(2) ~ f(@) 
(1.11.17), itis the case that 2 4 3. Nevertheless, f(2) = f(3) > 2 = 3, since 
fQ) = f(3) is refutable. 


1-1-ness is a notion that is independent of left or right fields (unlike the 
notions total, nontotal, onto). © 


III.11.23 Example. A function F is 1-1 iff F—! is a function. Indeed, F is 1-1 
iff it is single-valued in the first projection, iff F~! is single-valued in the 
second projection. 


III.11.24 Informal Definition (1-1 correspondences). A function F : A > 
is a 1-1 correspondence iff it is 1-1, total, and onto. We say that A and B are in 
1-1 correspondence and write 


A~ or Aw 


A 1-1 correspondence is also called a bijection, or a bijective function. An onto 
function is also called a surjection, or surjective. © 


WI.11.25 Example (Informal). The notion of 1-1 correspondence is very im- 
portant. If two sets are in 1-1 correspondence, then, intuitively, they have “the 
same number of elements”. On this observation rests the theory of cardinality 
and cardinal numbers (Chapter VII). 

For example, An.2n : N > {2n : n € N} is a 1-1 correspondence between 
all natural numbers and all even (natural) numbers. 


Let us now re-formulate the axiom of collection with the benefit of relational 
and functional notation. 


III.11.26 Theorem (Collection in Argot). 


(1) For any relation S such that dom(S) is a set, there is a set B such that 
S~'[B] = dom(S). 

(2) For any relation S such that ran(S) is a set, there is a set A such that 
S[A] = ran(S). 
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Proof. (1): Let S = {(x, y):.W(x, y)} for some formula .Y of the formal lan- 
guage. Let Z = dom(S). An instance of “verbose” collection (cf. III.8.4) is 


(Vx € Z)(Ay).Ax, y) > (AW)(AU(W) A (Wx € Z)@y EW), y)) 


Now, we are told that Coll,.(Ay).“(x, y) (cf. If.11.4); thus the assumption 
Z = dom(S) translates into 


tzEc (Wx)(x € Z <> (Ay)./(x, y)) 


and therefore the left hand side of (7) is provable, by tautological implication 


and V-monotonicity (1.4.24). Thus the following is also a theorem: 
(aW)(-U(W) A (Wx € Z)\Ay € W).Y(, y)) (ii) 
Let us translate (ii) into argot: We are told that a set W exists’ such that 
(Vx € Z)\Ay)(y e WA y € S(x)) 
Hence 
(Vx € Z)WOS(x) 4B 
and finally (see Remark III.11.5(5)) 
ZoS'[Ww] 


Since, trivially, Z > S—![W] we see that W will do for the sought B. 
(2) follows from (1) using S~! instead of S. 


11.11.27 Remark. Statement (1) in the theorem, and therefore (2), are “equiv- 
alent” to collection. That is, if we have (1), then we also have (i) above. To see 
this, let.“(x, y) of the formal language satisfy the hypothesis of collection, (i): 


(Wx € Z)(Ay)./(x, y) (a) 


for some set Z. Let us define S @ {(x, y) : A(x, y) Ax € Z}. Then (a) yields 
Z = dom(S). (1) now implies that for some set B, Z C S~![B], from which 
the reader will have no trouble deducing 


(Wx € Z)(Ay € B).7(x, y) (b) 


(b) proves (aW)(-U(W) A (Wx € Z)Ay € W).Y, y)). 


1 We are using the auxiliary constant W, in other words. 
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III.11.28 Proposition. 


(1) If Sis a function and A is a set, then so is S[A]. 
(2) If Sis a function and dom(S) is a set, then so is ran(S). 


Proof. (2) follows from (1), since ran(S) = S[dom(S)]. 
As for (1), it is argot for collection version III.8.12(4). Indeed, letting 


S={(x,y) 7%, y)} 
the assumption that S is a function yields (cf. II.11.13(4)) the theorem 
SX VWASL(X,2>y=2Z 
The aforementioned version of collection then yields 
Coll, (Ax € A)./(x, y) 
1.e., that the class term 


{y : x € A)./(x, y)} 


can be formally introduced (“‘is a set”). This is exactly what we want. 


© We have already noted in III.8.3(11), p. 164, that the proposition — being an 
argot rendering of III.8.12(4) — is equivalent to collection II.8.2.1 A proof 
of this equivalence will be given later once rank (of set) and stage (of set 
construction) have been defined rigorously. In the meanwhile, in practice, the 
proposition (i.e., collection version [I.8.12(4)) will often be used in lieu of 
collection (being an implication of the latter, this is legitimate). oe 


I.11.29 Corollary. [fF is a function and dom(F) is a set, then F is a set. 


Proof. By II.11.28, ran(F) is a set. But F C dom(F) x ran(F). 
Alternatively, let G © {(x, (x, F(x))) : x € dom(F)}. Clearly, G is a func- 
tion and dom(G) = dom(F), while ran(G) = F. 


The notion of function allows us to see families of sets from a slightly 
different notational viewpoint. More importantly, it allows us to extend the 
notion of Cartesian product. First of all, 


III.11.30 Informal Definition (Indexed Families of Sets). A function F such 


that ran(F) contains no urelements is an indexed family of sets. dom(F) is the 
index class. If dom(F) = @, then we have an empty indexed family. 


+ Or, as we simply say, collection. 
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If we let I be a name for dom(F), then we often write (F,),<1 to denote the 
indexed family, rather than just F or Aa.F(a). 


In the notation “F,” it is not implied that F(a) might be a proper class (it 
cannot be); rather we imply that the function F might be a proper class. 


Note that we called F, rather than ran(F), the indexed family (of course, ran(F) 
is a family of sets in the sense of IIT.6.3, p. 150). What is new here is the intention 
to allow “multiple copies” of a set in a family with the help of F. An indexed 
family allows us to be able to talk about, say, S = {a, b, a, a, c, d} without being 
obliged to collapse the multiple a-elements into one (extensionality would 
dictate this if we had just a set or class {a, b,a,a,c,d}). This freedom is 
achieved by thinking of the first a as, say, f(0), the second as f(2), and the 
third as f(3), where 


f = {(0,a), (1, b), (2, a), (3, a), (4, €), (5, d)} 


is an indexed family with index set dom(/’) = {0, 1, 2,3, 4, 5}, andran(f) = S. 
Why is this useful? 

For example, if a,b, c,d,...are cardinals (Chapter VII), we may want to 
study sums of these where multiple summands may be equal to each other. We 
can achieve this with a concept/notation like )°;. dom( fy J (). 

This situation is entirely analogous to one that occurs in the study of series 
in real analysis, where repeated terms are also allowed. © 


IiI.11.31 Example. Every family of sets A in the sense of III.6.3 leads to an 
indexed family of sets Ax.x that has A as domain or index class. 


Here is a family of sets in informal mathematics: {(0, 1/n) : n € N — {O}}, 
where “(a, b)” here stands for open interval of real numbers. This can be viewed 
as an indexed family fitting the general scheme — Ax .x — above. 

A more natural way is to view it as the indexed family An.(0, 1/n) of domain 
N — {0}, that is, ((0, 1/n)) 


neN—{O}" 


III.11.32 Informal Definition. Let (F,),¢z be an indexed family of sets. Then 


. def 7 
Ure # ran 
. def + 
CF 8 ran 


wa 


wm 


If lis a set 7, then 


[]#: = tf: f isa function A dom(f) = 1A (Wa € D f(a) € Fa} 


ael 
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If, for each a € I, F, = A, the same set, then Tuer 
is, /A is the class of all total functions from J to A: 


F, is denoted by / A. That 


{f: fisafunction A dom(f) = I A (Va € 1) f(a) € A} 


II1.11.33 Remark. (1) U,<; Fa and (),<; Fa just introduce new notation (see 
also the related III.10.22, p. 192). However, | [,.; Fa is a new concept that is 
related but is not identical to the “finite” Cartesian product. For example, if 
A, = {1} and A> = {2}, then 


2 
X Aj = Ay x Az = {(1,2)} (i) 
i=l 


while if we consider the indexed family (Aj; )je41,2}, then 


[[ 4: = ({G. b, (2, 2)3} (ii) 

ie{1,2} 
In general, an element of A; x --- x A, is ann-vector (x1,..., Xn), while an 
element of Tie a i Aj is a sequence f = {(1,x1),..., (n, Xn)}. Sequences 


are not tuples, but they do give, in a different way, positional information, just 
as tuples do. Sequences have the additional flexibility of being able to have 
“infinite” length (intuitively). Thus, while (a;, a2,...), where “...” stands for 
a; for alli € N, is meaningless as a tuple, it can be captured as a sequence 
f, where dom(f) = N and f(i) = a; for alli ¢ N. This makes sequences 
preferable to vectors in practice. 


In this discussion we are conversing within informal (meta)mathematics. We 
must not forget that in the formal theory we mirror only those objects that we 
have proved to exist so far —i.e., to have counterparts, or representations, within 
the theory; the arbitrary n, and N, are not among them. © 


(2) Requiring I to be a set in the definition of [] ensures that the functions 
f that we collect into [| are sets (by III.11.29). This is necessary, for classes 
can only have elements that are sets or atoms. 

(3) The sub-formula “f is a function” is argot for 


(Vz € f)(OPG) A (Ww € f)(a() = m(w) > 5) = 6(w))) 


III.11.34 Proposition. /f1 isa set, then\) 
over I 4 G, then (| 


F, and|| 


F, are sets. If more- 


ael ael 


act Fa is a set as well. 
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Proof. Since J is a set, so is ran(F) by III.11.28. Now the cases for U,<; Fa 
and ( ),<; Fa follow from III.11.32 and the axiom of union and from III.6.14, 


respectively. 
On the other hand, 
| [F. CPU x J Fu) 
ael ael 
Hence [[,., Fa is a set. 


11.11.35 Corollary. For any sets A and B, “B is a set. 


IiI.11.36 Example. Let A, B, C be nonempty sets, that is, suppose we have 
proved 


(Ax)x € A 
(Ax)x € B 
and 
(ax)x EC 
Let (by auxiliary constant) 
aceA 


beB 
ceEC 


Then (a,b,c) € A x B x C; hence we have proved in ZFC 
(ax)x € AA (x)x € BA (x)x €C > (Ax)x €CAXBXC 


We can do the same with four sets, or five sets, or eleven sets, etc., or prove by 

(informal) induction on n, in the metatheory, the theorem schema (see II1.10.21) 

if A; A@fori=1,...,n, then X A; 49. (1) 

l<i<n 

Metatheoretically speaking, we can “implement” any n-tuple (a,) € 

x aes as a sequence f ={(i,a;):1<i<n}. That is, the function f is 

a set (by 11.11.29) applied to the (informal) set J = {1,2,3,...,n}.' Thus, 
A Tier Aj. 


+ Thinking of the informal natural numbers as urelements, / is a set by separation, since the “real” 
M is aset. 
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It follows that 


if Aj #Pfori=1,...,n, then [A 49. (2) 
iel 
(2) can be obtained within the theory, once n and N are formalized. In the 
meanwhile, we can view it, formally, not as one statement, but as a compact 
way of representing infinitely many statements (a theorem schema): One with 
one set A (called A, in (2)), one with two sets A, B (A;, A2), one with three 
sets A, B, C (Aq, Az, A3), etc. 
As indices we can use, for example, {G}, {{O}}, {{{O}}}, {{{{G}}}}, etc., which 
are all distinct (why?)." 
Does (2) extend to the case of arbitrary (and therefore possibly infinite) 7? 
We consider this question in the next chapter. 


We have seen the majority of nonlogical axioms for ZFC in this chapter. 
The “C” of ZFC is considered in the next chapter. In Chapter V we will in- 
troduce the last axiom, the axiom of infinity, which implies that infinite sets 
exist.? 


III.12. Exercises 


TIDL.1. Prove hzp¢ —U(A) A -U(B) > (AC B> BF). 
III.2. Prove -zpc Coll,.2 > (Wx). 4 > #2) > Coll,.4. 
TII.3. Prove Fzpc U(x) > xN y=. 

TIL.4. Prove Fzpc U(x) > x -— y=. 

IIL.5. Prove Fzpc -U(x) ~ UC) > x -y=x. 


III.6. Let a be a set, and consider the class b = {x € a: x ¢ x}. Show that, 
despite similarities with the Russell class R, b is a set. Moreover, show 
+ b € a. Do not use foundation. 


IIL.7. Show F R (the Russell class) = Uy. 

IIL8. Show that Fzpc U(x) > OC x 

II.9. Show that if a class A satisfies A C x for all sets x, then A = @. 
I.10. Without using foundation, show that 6 4 {@}. 


1 We can still write these indices as “1”, “2”, “3”, “4”, etc. (essentially counting the nesting of 
{}-brackets), as this is more pleasing visually. 

= The infinity axiom does not just say that infinite sets exist. It says, essentially, that limit ordinals 
exist, which is a stronger assertion. 
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W.11. 


Ti.12. 


Ti.13. 


TiI.14. 


1.15. 


TIT.16. 


TI.17. 


TII.18. 
TH.19. 


111.20. 


W.21. 
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Interpret the extensionality axiom over N so that the variables vary over 
integers, not sets, and € is interpreted as “less than’, <. Show that under 
this interpretation the axiom is true. 


Show that if we have no urelements, and if our axioms are just ex- 
tensionality, separation, union, foundation, and collection, then this set 
theory 

(1) can prove that a set exists, but 

(2) cannot prove that a nonempty set exists. 

(Hint: Find a model of all the above axioms augmented by the formula 
(Vy)(+U(y) > (Wx)x ¢ y).) 

Suppose we have all the axioms except the one for pairing and the one 
that asserts the existence of a set of urelements (III.3.1). Show that these 
axioms cannot prove that a set exists. 

(Hint: Find a model of all the above axioms augmented by the formula 
(Vx)U(x).) 

(Bourbaki (1966b)) Drop collection version HI.8.2, separation, and 
union. Add Bourbaki’s axiom of “selection and union’, that is, collec- 
tion version (2) of III.8.12, p. 173. Prove that separation and union are 
now theorems. 


(Shoenfield (1967)) Drop collection version III.8.2, pairing, and union. 
Add collection version (3) of III.8.12, p. 173. Prove that pairing and 
union are now theorems. 


(Levy (1979)) Drop collection version III.8.2, pairing, and separation. 
Add collection version (4) of III.8.12, p. 173. Prove that pairing and 
separation are now theorems. 


Prove 
U(A) > P(A) = {9} 
in ZFC. 
What is (4 (and why)? 
Show that 
(1) +} AUB= BUA and 
(2) FAU(BUC)= (AUB)UC. 
Show that 
dd) -}ANB=BNA and 
(2) FAN (BNC)=(ANB)NC. 
For any set A in the “restricted” universe Uy (N C M), show that 
Uy — A is a proper class. 
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T.22. 
TI.23. 
Ti.24. 
T.25. 
111.26. 
W.27. 


111.28. 


T.29. 


TI1.30. 


111.31. 
111.32. 


TI1.33. 
TI1.34. 


TI.35. 
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Show for any classes A, B that A - B= A-—AN 
For any classes A, B show that AUB = A iff BC A. 
For any classes A, B show that AN B = A iff A C B. 
For any classes A, B show that A — (A — B) = Biff BC A. 
Prove III.6.15(2). 


(1) Express AM B using class difference as the only operation. 
(2) Express A U B using class difference and complement as the only 
operations. 


Generalized De Morgan’s laws. Prove for any class A and indexed 
family (B;);<r that 


(1) A-|JB =(\A-B) 

ieF ié 
(2) A-()\B =(JA-B) 

ieF ié 
Distributive laws for U, N. For any classes A, B, D show 
(1) AN (BUD) = (AN B)U(AND) 
(2) AU(BND) = (AUB)N (AUD) 


Generalized distributive laws for U,M. Prove for any class A and in- 
dexed family (B;);er that 


(1) ANUB =UanB) 


ieF 


(2) AU( )B =( \(AUB)) 


ieF 


Show that we cannot havea ebece---€a. 

Show that Vy is a proper class for any set N of urelements (including 

the case N = Q). 

Show that for any class (not just set) A, A € A is refutable. 

(1) Show that A =“‘the class of all sets that contain at least one element” 
can be defined by a class term. 

(2) Show that A is a proper class. 

Attach the intuitive meaning to the statement that the set A has n distinct 

elements. Show then, by informal induction onn €N, that for n > 0, 

if A has n elements, then P(A) has 2” elements. 


111.36. 


111.37. 


TI1.38. 
TIH.39. 
111.40. 
TiI.41. 
Ti.42. 


111.43. 


TiI.44. 


TH.45. 


TI1.46. 


1.47. 


111.48. 


TiI.49. 
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Show (without the use of foundation) that { {a}, {a, b}} = {{a’}, {a’, b'}} 
implies a =a‘ andb=D’. 

For any sets x, y show thatx U{x} = yU{y}> x= y. 

(Hint: Use foundation.) 

Prove Proposition III.10.13. 

Prove Proposition III.10.14. 

For any A, B show that@ = A x BiffA=@orB=49. 

For any set of urelements N, show that U3, C Uy. 

Distributive law for x. Show for any A, B and D that D x (AU B) = 
(D x A) U(D x B). 

Let F : X — Y bea function, and A C Y, B C Y. Prove 

(a) F-'[A UB] = F“'[A] U F"'[B] 
(b) F'[AN B] = F"'[A] N F“'[B] 
(c) if A C B, then F~![B — A] = F~![B] — F"'[A]. 

Is this last equality true if A Z B? Why? 

Let F : X — Y bea function, and A C X, B C X. Prove 
(a) F[AU B] = F[A] U F[B] 
(b) FI[ANB] C F[A] 9 F[B] 
(c) if A C B, then F[B — A] > F[B] — F[A]. 

Can the above inclusions be sharpened to equalities? Why? 


Which parts, if any, of the above two problems generalize to the case 
that F is just a relation? 


Let G be a function and F a family of sets. Prove 
@) C[UF] =UG"[F] 
@) C[AF] = AC [FI 
() G[UF] =UGIF] 
(d) G[ MF] ¢ () G[EF]. (Can C be replaced by =? Why?) 

Let F be a function, and A a class. Prove 

(a) FIF"'[A]] CA 

(b) F-'[IF[A]] D> A, provided that A C dom(F). 

Show by appropriate concrete examples that the above inclusions cannot 
be sharpened, in general, to equalities. 


os) 


Let the function F be 1-1, while A C dom(F) is an arbitrary class. Show 
that F~'[IF[A]] = A. State and prove an appropriate converse. 


Let B C ran(G). Prove G[G~'[B]] = B. State and prove an appropriate 
converse. 


oo 
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TI1.50. 


TH.51. 


TH.52. 


TIL.53. 


TH.54. 


TIT.55. 
TIT.56. 
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Let F be a 1-1 function and A C B C dom(F). 

(a) Prove F[B — A] = F[B] — F[A]. 

(b) Prove a suitable converse. 

Is the restriction A C B C dom(F) necessary? Why? 


For any relations S, T prove 

() s'yt=s 

(2) dom(S) = ran(S~!) 

(3) ran(S) = dom(S~!) 

(4) (SUT) !=S'!uT"!. 

Prove that if (S~!)~! = §, then S is a relation. Give an example where 
the equality fails. Which ones among (2)—-(4) in the previous exercise 
hold for arbitrary classes? 


Using only the axioms of union, pairing, and separation, show that if a 
function F is a set, then so are both dom(F) and ran(F). 


Show for a relation S that if both the range and the domain are sets, then 
S is a set. 


Show that if a relation S is a set, then so is S~!. 


If F : A > Bisa 1-1 correspondence, show that so is F~! : B > A. 


IV 


The Axiom of Choice 


From this chapter and onwards the reader will witness more and more the 
“relaxed proof style” (cf. II.5.9). 


IV.1. Introduction 


The previous chapter concluded with the question, can 


if A; #@forallie/, then | [4: 44 (1) 
ie] 
where J = {1,2,..., m}, be extended to the case of arbitrary (and therefore 


possibly infinite’) 7? 
The axiom of choice, AC, says yes. 


IV.1.1 Axiom (Axiom of Choice, or AC). If I and Ag, for all a € I, are non- 


empty sets, then||,-; Aa # Y. 


ael 


But why “axiom”? After all, the case for finite 7 is provable as a theorem, 
that is, (1) above. Before we address this question, let us first consider some 
more down-to-earth equivalent forms of AC. 


IV.1.2 Theorem. The following statements (1), (2), (3), and (4) are provably 

equivalent. 

(1) AC. 

(2) If the set F is a nonempty family of nonempty sets, then there is a function 
g such that dom(g) = F and g(x) € x forall x ¢€ F. 


1 The terms “infinite” and “finite” throughout this discussion have their intuitive metamathematical 
meaning. 
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(3) If the set S is a relation, then there is a function f such that dom(f) = 
dom(S) and f CS. 

(4) If the set F is a nonempty family of pairwise disjoint nonempty sets, then 
there is a set C that consists of exactly one element out of each set of F 
(i.e, for each x € F, Cx isa singleton). 


© Note. The function g in (2) above is called a choice function for F. © 


Proof. (1) > (2): Given a set F as in (2). Define i © ax.x with dom(i) = F. 
Then F can be viewed as the indexed family (i(x)),er, Or (X)xer. By (1) there 


isag €[[,-,x. Thus, dom(g) = F and g(x) € x forall x € F. 


(2) > (3): Given a relation S (set). Let F © {S(a) :. a € dom(S)}. F isa 


set by III.8.9, since dom S is a set (III.11.8). If F = 4, then S = O and f = J 
will do. So let F 4 . By (2), there is a choice function g, 1.e., dom(g) = F 
and g(x) € x for all x € F.i In terms of S, the last result reads g(S(a)) € 
S(a) for each a € dom(S). Clearly, f © {(a, g(S(a))) : a € dom(S)} 
will do. 

(3) = (4): Let F be as in (4). Define § & {(x, y): y ex € F}A Sisaset, 
since S C F x (J F. Now apply (3) to obtain f C S with dom(f) = dom(S) = 
F. Take C = ran(f), a set by I1.11.28. 

To verify, let x € F = dom(f). Then (x, f(x)) € S; therefore f(x) € x. 
This along with f(x) € C yields f(x) € CM x. Let also 


yeCnx (i) 


Hence y € C in particular. Then, for some z, f(z) = y; therefore (z, y) € S 
and thus y € z. By (i), y € xz, so that x = z (by the assumption on F). This 
yields y = f(z) = f(x). Thus, CN x = {f(x)}. 

(4) — (1): Let (Ag)ae; be an indexed family of nonempty sets (J 4 @ as 
well). Let F © ran(aa.({Aqg} x Aq)) for a € I. F isa set by III.11.28, and its 
members are pairwise disjoint sets; for if Ag # Aj, then (Ag, x) 4 (Ap, y) for 
all x, y, and thus ({A,} x Ag) NM ({Ap} x Ap) = W. By (4) there is a set C such 
that CM ({Aq} x Aq) is a singleton for alla € I. 

Define f © {(a, 8(y)) : (a, y) € 1 x (CN Aa} x Aa))}, which is a set 
by III.8.9, and obviously a function by the previous remark. Now, dom(f) = J 


1 x € F © (aa € dom(S))x = S(a) (cf. IIL.8.7). 
} By “y ex € F” [mean “y €x Ax € F”, ie., using € conjunctionally. 
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and f(a) = 5(y) € A, for alla € I; hence f € [] 
HO. 


act Aa» that is, [],<; Aa 


We will concentrate on the equivalence of AC with (4), due to the latter’s 
intuitive appeal. Is now (4) “really true’? Is it possible to choose one element 
out of each set x of a possibly infinite’ set F of nonempty pairwise disjoint 
sets, in order to form a set C’? In the finite case we can literally “choose” each 
representative from eachx € F, forif there are n such choices, we can fit them in 
a proof that is about n lines long (see the proof of (1) at the closing of Chapter III). 
We cannot do the same in the infinite case, for proofs must have finite length. 
Of course, it often happens that we can describe an infinite process — such as 
an infinite sequence of choices — in a finite manner. For example, if F consists 
exclusively of nonempty subsets of N, then we can define C compactly without 
having to list our infinite set of choices: C = {y : y is smallest in x A x € F}. 

One point of view maintains that to accept the existence of a set like C we 
must be able to give a “rule” or “unambiguous definition” A[y] — just as in 
the example above. Holders of this viewpoint do not accept AC as a legitimate 
axiom. They argue that in the absence of “structure” in the set members x of 
F, all the elements of such x “look alike’, and therefore the infinite process of 
“choosing” cannot be compacted into a finite well-defined description. This is 
true even for very small sets x (it is the size of F,, not that of x € F, that creates 
the problem).! 

A well-known example due to Russell contrasts an infinite set of pairs of 
shoes with an infinite set of pairs of socks. In the former case the set C can 
be defined compactly to consist of, say, the left shoe out of each pair. In the 
case of socks this “rule” does not define well which sock to pick, because, even 
though they are distinct objects, the two socks in a pair cannot be distinguished 
by “left” vs. “right”. 

The other philosophical point of view accepts that sets exist outside our- 
selves despite our frequent inability to define them “well”, or to describe them. 
Thus, the choice set C is some arbitrary “partition’® of the objects of “real” 
mathematics into members (of C) and non-members. As such, it exists whether 
we can define it well or not.‘ 


1 Intuitively speaking, for now. 

t Ifeachx € Fisa singleton, then, of course, C can be well defined. 

8 We do not attach any technical significance to the term “partition” here. 

1 It is conceivable that someone some day may come up with a way to describe how to choose one 
sock from each one of an infinite collection of sock pairs. It would be therefore unwise to say “it 
cannot be done” simply because you or I cannot do it today. 
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Under this Platonist interpretation of “existence”, a set of representative 
socks, one out of each pair in an infinite set of pairs, certainly exists. More 
generally, (4) and, equivalently, AC are “true”. 


IV.2. More Justification for AC; 
the “Constructible” Universe Viewpoint 


© Throughout this section we reason Platonistically in the domain of informal (or 
“real”) set theory. Within the set-formation-by-stages doctrine we can argue 
the reasonableness of AC, paraphrasing Gédel’s proof of the consistency of AC 
with the remaining axioms of ZF (Gédel (1939, 1940)). 

Let us take seriously the challenge that AC does not provide us with a well- 
defined “rule” to effect the potentially infinitely many choices. Thus, for the 
balance of this section we will respond by demanding that all sets — not just 
those that AC asserts to exist — be “given” by a well-defined rule. This just 
levels the playing field. 

In particular, when we apply the power set operation, P(A), to an infinite set 
A, we will accept that only those subsets of A that are “well-definable” exist 
(i.e., as sets; we will soon make this requirement precise).! 


This attitude is similar to the one that separates collections into sets and proper 
classes: That some collections are not sets is a situation we are by now com- 
fortable with. In this section we further narrow down what collections we will 
accept as sets in defense of AC. 

This is a /ocal restriction, however, valid only in this section. In the remainder 
of the volume we revert to our understanding of “real” sets as this was explained 


in Chapter II (II.1.3). © 


1 The reader will observe that all we are doing here is arguing that a proposed new axiom is 
reasonable. This is a process we have been through for all the previous axioms, and it does 
not constitute a proof of the axioms in the metatheory. The notion “reasonable” is not tempo- 
rally stable. When Cantor introduced set theory, the entire theory was “unreasonable” to many 
mathematicians of the day — including influential ones like Poincaré, who suggested that most of 
Cantor’s set theory ought to be discarded. When Russell proposed to found mathematics on logic, 
this too was considered as an “unreasonable” point of view. For example, Poincaré protested that 
this was tantamount to suggesting that the whole body of mathematics was just a devious way 
to say.4 <4. As mathematics progresses, mathematicians become more ready to accept the 
reasonableness of formerly “unreasonable” concepts or statements. 

“Well-definable” is just an emphatic way of saying “(first order) definable”, in the sense that 


oe 


we can write these sets down as class terms. We have already remarked (cf. II.4.2(b)) that 
we cannot expect all subsets of N to be first order definable, for they are far too many of 
them. 
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The definition of the “well-definable” sets will be given by formulas of set 
theory. We hope to be able to “sort” all possible such formulas in “ascending 
order’, indicating such an order by the symbol “~<”. Next, we will see that < 
on formulas naturally induces an order on the defined sets. To avoid notational 
confusion, we will use the symbol “a” for this induced order on sets. With some 
luck, <« will arguably be a well-ordering, i.e., each nonempty class will have 
a minimum element (with respect to <). If all this succeeds, we will have two 
things: 


(1) A restricted universe of sets, IL, where all sets are definable (all the other 
sets are ignored — banned from “sethood”, that is). 
(2) There will be a well-ordering, <, on this restricted universe. 


Thus, if A is any set of nonempty sets, a choice function for A can be 
(well) defined by setting f(x) equal to the minimum a € x with respect to the 
ordering <. 

To begin with, we will need a judicious reinterpretation of what is going on 
at each stage of set construction. 


A stage of set construction is one of two possible types: a collecting type, or a 
powering type. 

At a collecting stage one collects into a set all the objects that are available 
so far. In particular, since the urelements are given outright, the Oth stage is a 
collecting stage, at which the set of all urelements, call it M, is formed. At any 
subsequent collecting stage, we form the union of all the sets that were formed 
at all previous stages. 


At a powering stage we form the set of all well-definable subsets of the set 
formed at the immediately previous stage — a sort of truncated power set. 


The stages occur in the following order, defined inductively:t 


(i) The Oth stage is a collecting stage at which the set of all urelements, M, is 
formed. 

(ii) If at the arbitrary collecting stage the set X has been formed, then this 
stage is followed immediately by infinitely many powering stages, to form 
the sets X,, Xo,..., X;,..., where 


X,=D(MUX) and, forn>1 Xji;=D(MUX,) (1) 
1 The following informal definition is adequate for our informal discussion. A precise version 


will be given with the help of ordinals — formal counterparts of “stages” — when we revisit the 
constructible universe in Chapter VI. 
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In (1), D(A) denotes the set of all definable subsets of A. We will soon 
make the meaning of the term D(A) precise, but for the time being let us 
imagine that it is a pared-down version of P(A), that is, D(A) C P(A) for 
infinite sets A.i 

(iii) Immediately after each such infinite sequence of powering stages, a col- 
lecting stage occurs to form the union of all the sets formed at all the 
previous stages. 


This process, alternating between (ii) and (iii), continues ad infinitum and con- 
structs all sets (all definable sets really, but you will recall that in this section 
we pretend that these are the only legitimate sets anyway). © 


IV.2.1 Remark. The need for collection, after each sequence of powering, 
should be clear. For example, if we stop the process after the first sequence 
of powering, then, even though we have constructed sets with arbitrary integer 
depth of nesting of {}-brackets, we have not constructed a single set that contains 
as elements sets with all possible depths of nesting of {}-brackets. 

Specifically, if X,, X2,..., X,,... constitutes the first sequence of power- 
ing, then 


Oe X{ 
and forn > 1, 


(ieee Ns eames eae, ee 
ed 


But none of the X; contains all of the 


for all n. © 


It is useful to observe the following important property of each set Y con- 
structed at some stage: [fx € ye MUY, thenx e MUY+ 
The claim is trivially true if y is an atom (see the previous footnote). 


¥ Of course, in principle, we can list explicitly al/ subsets of any finite set, so that for finite sets A 
we intuitively accept that all their subsets are definable, i.e., D(A) = P(A) in this case. 

A set S that satisfiesa € b € S > a € S—thatis,a EDAD ES Sa € S —iscalled transitive. 
Such sets play a major role in set theory — all ordinals are transitive, to make the point. Here are 
two simple examples: 


(a) {#, ?}, where # and ? are urelements (a € b € S > a € S is true for b an urelement, since 
then a € b € S is false) 


(b) {9, {9}. 
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We say “true” and “false” freely in this section, since we are working, like 
Platonists, in the metatheory. © 


If y is a set, then we rephrase what we want to prove as follows: 
yeMUY>~yCMuUY (2) 


We prove (2) by induction on stages (that we went through towards building 
Y). To this end, we need to verify the following: 


(i) The set Y constructed at the Oth stage has the property. 
(ii) The property propagates with collecting. 
(iii) The property propagates with powering. 


As for (i), this is true because the Y formed at stage 0 is M, and M is transitive 
(see the footnote to the claim) — or, another way of saying this, y ¢ MU Y is 
false (M U Y = M contains only atoms, while y is a set). 

As for (ii), let Y = U{Z, W,...} be formed at a collecting stage, where 
Z,W,...are all the sets formed at all the previous stages. Let y e MUY. 
Thus, y € Y (for M contains only atoms); hence 


ye (say)W CMUW 


By the induction hypothesis (I.H.), y C MU W, and, since W C Y, it follows 
thaty CMUY. 

As for (iii), let Y = D(M U X), where X is the set we have built at the 
immediately previous stage. Let y ¢ M UY be true and take any x € y. Again, 
yéY,thusy CMUX,hencex e MUX. 

Now, if x € M, then x € M UY. If not, then x is a set, and I.H. yields 
x C MUX, from which follows x € Y. Thus, in either case, x € MU Y, and 
(x being arbitrary) y C MUY. 


IV.2.2 Remark. We state an important by-product of the transitivity of the sets 
M U X, where X has been obtained in our construction. If Y = D(M U X) for 
some such X, then we have that 


XOCMUY 


Indeed, let x € X. If x € M, then we are done. Else, x is asetandx «¢ MUX; 
therefore x C MU X. It follows that x € Y. © 


We now turn to how the sets obtained at powering stages are actually “de- 
fined”. Let us “sort” in an arbitrary fixed way the alphabet of the first-order 
language of logic that we have been using all along (we will use the symbol ~ 
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to indicate the assumed order on the alphabet, .4, of logical and nonlogical 
symbols). Now, 


46 — {V, A, mar) ‘A, Vv, 5 >, =; G ), Vv, I, iA U} 


where the symbols v and | are used to build the object variables vo, v1, v2,... 
as viv, v||v, v|||v,... (as usual, we can use abbreviations such as x, y, z— with 
or without primes or subscripts — for variables). Let us fix the order 


V<~4d«~«73<~A<V «~>3<<© ~=<(<) <~ v < | ~E<U (3) 
for 4. 
We next augment our alphabet to include the names 0, 1, 2. ..., My... Of all 


urelements,' and exactly one name for each definable! set. 


As a matter of notation, if c is (I mean, informally names) a definable set, 
then © will denote the unique name for c that we import into our alphabet .4. 
In particular, the horrible notation C, will stand for the sequence of names 
Cant coin’ 


¥ It might be thought — with some justification — that we are cheating somewhat here by taking M, 
the set of all urelements, to be N. Recall however that all that we are after is to 


(1) give a philosophically plausible description of what sets are, and 
(2) within this description argue that AC holds. 


In other words, following Gédel, we are proposing an informal and plausible universe of “real” 
sets. 

We have chosen N as the set of urelements because AC holds on it by the least integer principle. 
How well does this choice hold philosophically, i.e., how well are we serving requirement (1) 
above? Well, it should not be too difficult to accept the view that the primeval “real stuff” of 
mathematics — the atomic objects — is the natural numbers, and that all else in mathematics 
we build starting from these numbers. After all, one of the most careful among the fathers of 
foundations, Kronecker, had no trouble with this position at all. He is said to have held that “God 
created the integers; all else is the work of man”. Mind you, Kronecker, the mentoring father of 
intuitionism and a confirmed finitist, did not allow for the entire set of natural numbers, but only 
granted you the right of having as many numbers as you wanted by simply adding one to the last 
one you have had. 

Even technically, one can argue that the choice of such a small set of urelements does not 
restrict our ability to use sets to do mathematics, for it turns out that even a smaller set works 
(i.e., leads to a set theory that is sufficiently rich for the purposes of doing mathematics). Namely, 
as we shall see in Chapter V, von Neumann has shown how to build the natural numbers and, 
therefore, also Kronecker’s “all else”, starting from @. 

Definable in the process that we are describing. By the way, introducing a unique name for each 
“real object” of a collection is a trick that we have already used in describing the semantics of 
first order languages in 1.5.4. 
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Our goal is to extend the order < from .# to all names, and then to induce 
it on the named objects, that is, all objects of our definable universe. 
Thus, what we have set out to do is to achieve 


aab iff @<b (4) 
We do this in stages, starting by extending (3) into 


Vw~dAK AKA SKV KS KS K=<(K) ~~ | KEXU XOKIK-:- 


(5) 


It is trivial that < in its present stage of definition given by (5) is a well-ordering, 
that is, every nonempty set of symbols in.4 U {nm : n € N} has a <-smallest 
element. 


IV.2.3 Definition. A set a is definable from X, a formula P (V0, Vj, +++5 Un) 
over the initial alphabet (3), and the parameters bn iff 


a= {x eX: PC, b,) is true in xX} 


where the (constant) objects b;, named by bi, are allin X. 

“Is true in X” means that the truth value is “computed” by restricting 
all bound variables of the sentence P(x, Dn) to vary over X.} That is, an 
occurrence of (Ay) or (Vy) in the formula means (Ay € X) or (Vy € X) 
respectively. 


D(X) denotes all sets a definable from X and parameters in X (for all P 
over the initial alphabet (3)). 


We will well-order the class of all definable sets by well-ordering their 
definitions. 


We do not append all the names ¢ to .4 at once. We have only appended the 
names of the atoms so far to form the alphabet (5). There are two good reasons 
for this: One, we will augment our formal symbol set by stages, so that as it 
grows it stays (provably) well-ordered. Two, we will add a name only after its 
corresponding set has been seen to be definable; for, conceivably, not all sets 
are definable. 


1 Well-orderings will be studied in detail in Chapter VI. 


t Whyvin A(R, tn )? This is because each object x € X that we check for membership in a enters 
the defining formula 7 via its name. 
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In order to keep the notation simple as we append more symbols, we will 
continue naming the so augmented alphabet “4”. Thus, (5) depicts the version 
of .4 that we have immediately after appending the names of all the atoms to 
the initial alphabet (3). 

As usual, S* denotes the set of all nonempty strings of symbols from S, while 
S', for 0 < i € N, denotes the set of strings of length i. Thus, St = U., S' 


(cf. 1.1.4). & 


IV.2.4 Lemma. Let < be a well-ordering on the set S. We extend < to S* by 
the following rules, but still call it <. 


* We order by increasing string length, i.e., all the elements of S' precede 
those of S'*?, 

¢ In each equal-length group (i.e., each S') we order the strings \exicograph- 
ically (as in dictionaries), that is, of two unequal strings, the smaller is the 
one that in the leftmost position of disagreement contains the smaller of the 
two disagreeing symbols of S. 


Then < on S* is a well-ordering. 


Proof. Let ®@ #C C S*. Pick any string ajaz ... a, of shortest length n. We 
define a sequence of transformations: 


Transform ad)... Gy tO @ja2... An 


where a is the <-smallest in S (S is well-ordered!) such that @ja2... dn € C.t 
In general, assuming that @)@2 ... Gj@j41 ... Gn € C has been defined, 


Transform Gd)... Gj4j41 ... Ay tO G\d2 ... Gjj41 ... An 


where G;+1 is the <-smallest in § such that Gd... @Gj41...dy, € C. 
Thus, we have defined, by induction on i (< n), a <-smallest element 


Ga)... Gy of C. 


IV.2.5 Corollary. For the version of 4 given by (5),.4* is well-ordered by <. 


— 


Of course, a string over S$ of length i is just a member of the i-copy Cartesian product S!: 
(x1,...,%;). One usually writes strings without the angular brackets, and without the comma 
separators, like this: xjx2 ... x;. Naturally, if the latter notation becomes ambiguous — e.g., if 
S = {0, 00} then 00 might be either (00) or (0, 0) in vector notation — then we revert to the vector 
notation. 

= @ may be the same as ay. 
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So far, we have at stage 0: 


* a well-ordering of M, «, defined to be equal to the “standard <” on the set 
of atoms (since M = N), and 


¢ arelation < satisfying (4) on M because of (5). 


This observation validates the basis of the induction on stages that we now 
embark upon. 


Assume then that < has been extended to a well-ordering on the set of all 
objects M U X defined so far, in such a way that (4) holds, where < is a well- 
ordering on the set of all symbols and names that we have up to now — this 
augmented set is still called .4 — and moreover assume that the present stage 
to be “executed” is a powering stage that will yield Y = D(M U X). 


We now extend < to Y by cases: 
Let {a,b} CY. 


Case 1. {a, b} C X. (This is a legitimate case by Remark IV.2.2.) Then, a <b 
is already defined, and we do not alter it. By 1.H., (4) holds. 

Case 2. a € X andb ¢ X. Then define a ab and@ < b. Thus (A) still holds. 

Case 3. a # b, wherea ¢ X and b ¢ X (i.e, both are “new” objects — hence 


sets). Let 

a={x €MUX: AR, a) holds in M U X} (6) 
and 

b = {x € MUX: O(%, bm) holds in MU X} (7) 


where the parameters a; and b; arein MU X. 


By Corollary IV.2.5, < extends from. 4 to. 4* as a well-ordering. 
Now, every definable set will be defined infinitely many times in this process.+ 


~ 


Thus, to extend ~< and < by adding to them the pairs (a, b) and (a, b) — or 
(b,a@) and (b, a), as the case may be — respectively, we look, in a sense, for 
the “earliest construction times” for a and b, or, more conveniently, for the 


<-smallest definition. 


(i) Of all the possible formulas A (vo, U,) — over the initial alphabet (3), with 
free variables vo, . . . , Uv, — that can define the set a at this stage via appropri- 
ate parameters, denoting members of MU X, substituted into the variables, 


T For example, a defining formula 7 defines the same set as 7 V 7, or =F, etc., or any other 
formula @ for which the equivalence Y < @ holds, trivially or not. 
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we choose the <-smallest in (6). Similarly, of the possible 7(vp, U,,) that 
can define the set b at this stage, we choose the <-smallest in (7). This 
invokes IV.2.5. 

(ii) Of the possible parameter strings Ga) . .. G, that work in conjunction with 
the formula / (v9, v,) chosen in (i) above to define a as in (6), we choose 
the <-smallest. Similarly for b. This also invokes IV.2.5. 


After having exercised all this caution, and having chosen Y, 7, Gn, and Dn as 
directed in (i) and (ii) above, we extend < and ~ by defining 


P vo, Un) < Gv, Um)s 
or 

P (v0, Un) = C(vo, Um) (equal as strings), 
and Ga ...G_ < bby... Bm 


aabanda@ <b iff 


where the ~ to the right of “iff” is meaningful, since the involved strings are 
in.4Z*. The “normalization” of Y and @ used in (6) and (7) ensures that the 
extension of < (and hence of <) is well defined, and is still a well-ordering, 
since the < to the right of “iff” is. 


This settles the induction step with respect to powering stages — having 
extended < to Y so that (4) holds. 


Suppose finally that the stage we are about to execute is a collecting stage 
that builds Y as ){X, W, Z,...}. By LH., < is a well-ordering on each of 
X,W,Z,.... 

Let a, b be in Y. Then a, b are in X, say. Then a <b is already defined and 
satisfies (4); thus we need do nothing further.‘ 


This concludes the definition of < as a well-ordering of the entire class of 
definable sets and atoms, L. 


We have obtained more than what we set out to achieve: 


IV.2.6 Metatheorem (Strong, or Global, Choice). /f F is any class! of mu- 
tually disjoint nonempty sets R, S,..., then there is a class T that consists of 
exactly one element from each of R, S,.... 


t Since < is updated only at powering stages as new sets get constructed, it is never redefined 
during the normalization (i) and (ii) above. Thus it cannot be that a<b in X while sa <b in, 
say, W above. The reader will also observe that IV.2.2 validates our contention that a and b are 
both in some earlier “X”. 

= Not just set. 
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Proof. Put in T the <-smallest element out of each x € F. 


IV.2.7 Corollary. AC holds in L. 


Proof. If F is a set, then so is T by collection (e.g., II.11.28, p. 206). 


IV.2.8 Remark. (1) Our notational apparatus does not allow higher order ob- 
jects that are collections of (possibly) proper classes. If for a minute such objects 
were allowed, and if & is one of them and it happens to contain mutually dis- 
joint nonempty classes (not just sets), then a higher order collection T exists 
which contains exactly one element out of each x € &. Just use the <-smallest 
out of each x. 

(2) Informally, we have established the acceptability (= informal “truth”, 
modulo some appropriate understandings of what the real sets really are and 
how they come about) of a strong choice principle, and hence of AC. We did 
this under two opposing “philosophies” regarding set existence, a Platonist’s 
approach (p. 217) and, subsequently, a definability or constructibility approach. 

It must be conceded that under the philosophy “existence = definability”, 
even though the argument itself that < exists and is a well-ordering of the universe 
is sound and can be promoted into a rigorous proof within formal set theory 
once we learn about ordinals, the background hypotheses could be attacked on 
the grounds that “real” sets might not be constructed in the manner we have 
assumed. The whole argument was a “what if”.* In particular, there might be 
dissent on the choice of urelements, on what is going on at stages, on the use 
of exclusively first order formulas in defining sets, etc. 

Let us be content with the fact that at least the plausibility if not as much as 
“proof” of a strong AC has been established under this philosophy, because the 
picture suggested of what sets are is intuitively pleasing and natural. 

(3) In an axiomatic approach to set theory one adopts certain basic axioms 
which are plausible (or, more boldly, “true”) and adequately describe our a 
priori perception of the nature of sets. The latter means that the axioms must 
also be sufficiently strong to imply as many “true” statements about sets as 
possible. 

There are two difficulties regarding these requirements. The first is a technical 
difficulty, pointed out by Gédel (incompleteness theorem), namely, that there 


t That is, a construction of a model for ZFC. The reader will note that in this “model” we only 
verified AC. Of course, one must verify all the nonlogical axioms in order to claim that a structure 
is a model. However, since we will revisit the constructible universe formally we chose here to 
only deal with our immediate worry: the “truth” of our newest axiom, AC. 
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are axiomatic theories (set theory — unfortunately — being one such) which are 
incompletable, i.e., as long as they are consistent, they can never capture all the 
true sentences that they are intended to capture (as theorems), no matter how 
many axiom schemata we add (even an infinite number, as long as the formulas 
that are axioms are recognizable as such). 

The other difficulty has to do with limitations of our intuition (of course, 
intuition advances and becomes more permissive as mathematical culture de- 
velops). We do not know a priori what statements are supposed to be true (a 
good thing this: otherwise mathematicians would be out of business), and coun- 
terintuitive consequences of otherwise perfectly plausible axioms (shall I say 
“true”?) may unfairly reflect badly on the axioms themselves, in any mathe- 
matical culture that is not of sufficiently high order for mathematicians to know 
better. 

That “perfectly acceptable” axioms can lead to theorems that will seriously 
challenge one’s intuition cannot be better illustrated than by Blum’s speed-up 
theorem‘ in computational complexity theory. This theorem follows from the 
only two axioms of the theory, both of which are outright “true”. 

The theorem says that there is a computable function f on N with values in 
{0, 1} which is so difficult to compute that for any program that computes f 
there is another program that computes it significantly faster for all but finitely 
many inputs — in other words, there is no “best” program for f. Now, this result 
is certainly in conflict with intuition, but acceptable it must be, for the axioms 
in this case are unassailable. 

Acceptability of AC was initially hampered by a similar phenomenon: It 
implied results that were unexpected and hard to swallow. The most notable such 
result was Zermelo’s theorem that every set can be well-ordered, in particular 
that the set of reals can. See also the discussion in Wilder (1963, pp. 73-74); in 
particular note the concluding paragraph on p. 74. 

To AC’s defense, we observe that mathematics is not entirely innocent of 
counterintuitive constructions or theorems even in AC’s absence. We have al- 
ready noted Blum’s theorem. Other examples are Weierstrass’s construction of 
a continuous nowhere differentiable function, and Peano’s space-filling curve 
(see Apostol (1957, p. 224)). Besides, we need AC because of vested interest. 
Without it, much of mathematics is lost. For example, the standard fact that a 
countable union of countable sets is countable crumbles if we disown AC (and 
this may come as a surprise to many readers).! 


¥ See Blum (1967), or Tourlakis (1984), where this theorem is rehearsed in detail. 
} Feferman and Levy (1963) have constructed a model of Zermelo-Fraenkel set theory without 
AC, where the reals R, provably uncountable in ZF, are a countable union of countable sets. 
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(4) Two formal (i.e., syntactical) questions about AC must be settled right 
away: 


(a) If ZF is consistent and we add AC to its axioms, is the new theory, ZFC, 
still consistent? 

(b) Is AC provable in (i.e., a theorem of) ZF, assuming that ZF is consistent? 
(Of course, an inconsistent ZF would prove every formula, including the 
one that states AC (I.4.21).) 


Gédel has answered (a) positively (1939, 1940; see also Devlin (1978)) by 
two different methods, constructing in ZF the constructible universe of sets (it 
is his first construction that we “popularized” within naive set theory to define 
in this section). On the other hand, Fraenkel and Mostowski (see Jech (1978a)) 
and Cohen (1963) answered question (b) negatively. 

Thus, both AC and its negation are consistent with ZF, and one can take 
or leave AC without logical penalty either way. In this sense, AC has in the 
context of ZF the same status that Euclid’s axiom on parallels has in the context 
of axiomatic geometry. Adopting or rejecting Euclid’s axiom is just a reflection 
of what kind of geometry one wants to do. Similarly, adopting AC or not reflects 
the sort of set theory, and ultimately mathematics, one wants to do. 

As we have indicated earlier, it makes sense to take a more direct approach to 
our choice of axioms (rather than the indirect, or “results-driven”, approach), for 
it is easy to be misled by strange but correct results. If at all possible, we should 
adopt axioms by judging them on their own plausibility rather than on that of 
their consequences. On that count, AC is nowadays generally accepted without 
apology, since it is not any less plausible than, say, the axiom of replacement. It is 
noteworthy that the first order /ogic which Bourbaki uses as the foundation of his 
multi-volume work Eleménts de Mathématique contains a powerful “selection” 
axiom — using the t-operator (cf. Section I.6) — that directly turns the axiom of 
choice of set theory into a theorem. 


IV.3. Exercises 


IV.1. Show by an example that the assumption of pairwise disjointness is 
essential in the proof (3) — (4) of IV.1.2. 


IV.2. Show that if for two objects A and B in L the formula A € B is true, 
then A < B is also true. 

The following exercises are best approached after the reader has mastered 

the concepts of order and inductive definitions on ordered sets (Chapter VI). 

They are presented here because of their thematic unity with the concepts of 
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the present chapter. The Kuratowski-Zorn version of AC is particularly useful 
in many branches of mathematics. 
The reader is encouraged to try them out informally. 


IV.3. AC implies that (Zermelo’s well-ordering theorem) every nonempty set 
A can be well-ordered, i.e., an order < can be defined on A so that every 
nonempty B C A has a <-minimal element b (that is, “(Gx € B)x < b). 
(Hint. Follow Zermelo’s (1904) proof. Do not use the overkill of ordinals 
(VI1.5.50). Instead, let f be a choice function on F = P(A) — {6}. Let 
B ¢€ F be called distinguished if it can be well-ordered by some order 
<p so that, for every b € B,b = f(A— {x € B: x <p b}) (we call 
{x € B: x <p b} a segment (in B), the one that is determined by b). 
For example, { f(A)} and { f(A), f(A — {f(A)}} are distinguished (in 
the latter, of course, we set f(A) < f(A — {f(A)}). Show that for any 
two distinguished sets B and C, one is a segment of the other, if they are 
not identical (think of a maximal common segment; { f(A)} is certainly 
a common segment). Take then the union of all distinguished sets, and 
compare with A.) 


IV.4. A linearly or totally ordered set A is one equipped with an order < such 
that for any a, b in A itis true thata =bVa<bvb <a. 


Formally, we proclaim that <: A — A isa linear order if we we have a proof 
of (or have assumed) (Va)(Vb)(a =bVa<bvb <a). 


Show that if every set can be well-ordered, then (Hausdorff) in every set A 
ordered by, say, <, every totally ordered subset B is included (C) in a maximal 
totally ordered subset M of A. 

Note. Maximality means that if a ¢ A — M, then for some m € M neither 
a<mnorm <a. 

(Hint. If B = A, there is nothing to prove. Else, let <w be a well-ordering 
of A — B (which in general has no relationship to < that is already given on 
A). By induction on <w, partition A — B into a good and a bad set: Put the 
<w-minimum element of A — B in the good one if it is <-comparable with all 
x € A—B;else put it in the bad one. If all the elements of {x ¢ A—B:x <w a} 
have been so placed, then place a in the good set if it is <-comparable with all 
the elements in A and good; else put it in the bad one.) 


IV.5. Show that the italicized statement that follows (due to Kuratowski and 
Zorn; also known as “Zorn’s lemma’) is a consequence of Hausdorff’s 
theorem in Exercise IV.4 above. If every totally ordered subset B of an 
ordered (by <) set A has an upper bound (that is, an element b € A such 
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that x € B implies x = b or x < b; in short, x < b), then for every 
element x € B there is a <-maximal element a of A such that x < a. 
Note. a above is <-maximal in A in the sense that ~(Ax € A)a < x. 


Show that the Kuratowski-Zorn theorem in Exercise IV.5 above im- 
plies AC. This shows the equivalence of all four: AC, the Zermelo well- 
ordering theorem, Hausdorff’s theorem, and Zorn’s lemma. 

(Hint. Let A 4 @ be given. We want achoice functionon F = P(A)—{G}. 
There certainly are choice functions on some subsets of F — on any finite 
subsets, as a matter of fact. Let.Y be the set (why set?) of all choice 
functions on subsets of F. For f, gin.¥ define the order f < g to mean 
f Cc g. Next, argue that any totally ordered subset of .7 has an upper 
bound, and thus apply Zorn’s lemma to get a <-maximal member yf of 
F. Argue that y is a choice function on F.) 


Vv 


The Natural Numbers; Transitive Closure 


V.1. The Natural Numbers 


We are now at a point in our development where much would be gained in ex- 
positional smoothness were we to have a formal counterpart of N in set theory. 
For example, the main result of the next section is that of the existence of 
the transitive closure, Pt, of an arbitrary relation P (an important result that we 
will need in Chapter VI). We will prove that Pt = L)°, P’. 
This requires that we settle the questions 


(1) what is P’, and 
(2) what is LJ, P’? 


Now this issue is much more complex than dealing with one (or, in any 
case, “finitely” many) P’ at a time — like P*, P!°!, [P!730 _ which we can 
define, and use, formally without the need for a formal copy of N. The trick of 
absorbing the informal number i inside the name so that it is invisible in the 
theory was done before (and discussed, for example, on p. 12). For example, 
P? = {(x, y) : (Az)(yPz A zPx)}. 

Here we need to collect all the infinitely many FP’ into a class, and to allow 
the formal system to “see” the variable i, in order to speak of “(J-2,...”, a 
short form of “{z : (Gi in an appropriate ZFC set of i’s)z€...}”. 

Clearly, this is true even if P is a set (a restriction we want to avoid); therefore 
we need to formalize the presence of the “natural number” i. 

A similar situation arises in computer programming: We can use “infor- 
mal subscripts” 1, 2,3,...to denote several unrelated variables as X1, X2, 


+ When P is a set, things are a bit easier. We can then prove existence of P+ not by confronting 
Ure, P! but by avoiding it. See Exercise V.16. 
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X3,... but we cannot refer to these informal subscripts within the program- 
ming formalism to access, say, the ith variable Xi, since the programming 
language does not see the i inside the name. For example, 


for i = 1 ton do 
Xi <i 

end do 
just refers to one variable — named Xi — at all times and it successively changes 
its value from 1| through n. It does not refer to variables (named) X1, X2,..., 
Xn. 

Now, if we use i as a formal subscript as in a subscripted variable (or “array”’) 

X [i] then the following refers to n different variables, X[1] through X[n]: 


for i = 1 ton do 
Xfi] <i 
end do 


We proceed now to introduce a formal counterpart of N, denoted by the standard 
symbol w. 
V.1.1 Definition. A set A is inductive iff 

BEAAWxE A)xU{x} EA 


For any set x, x U {x} is called the successor of x. A set y is a successor iff 


y =x U {x} for some x. 
Thus “A is inductive” is (represented by) a formula of set theory. 


V.1.2 Example. An inductive set A contains 0, {0}, {@, {0}}, and as we go 
on, applying the successor operation again and again, we increase the depth 
of nesting of {}-brackets by | at every step. So these depths are successively 
OF 1 2iness 

We can identify these depths of nesting with the natural numbers of our 
intuition. Better still, we can identify 0, {0}, {@, {O}}, etc., with the natural 
numbers (this is “better” because, unlike the nebulous “depth of nesting”, these 
sets are objects of set theory). 


Of course we have to settle a few things. Does any inductive set really exist? 
Is it not possible that an inductive set might contain much more than what we 
would care to identify with natural numbers? In a way, the answers are “yes” 
and “yes” — the first by the “axiom of infinity”, the second by the fact that 
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we have “limit ordinals” larger than w. These are inductive sets that contain 
much more than just copies of the intuitive natural numbers (this will be better 
understood in Chapter VI). 


V.1.3 Axiom (Axiom of Infinity). There is an inductive set. 


By the remark following V.1.1, the axiom is a formula of set theory. Because 
of Example V.1.2, an inductive set — if such a set “truly” exists — has as a subset 
a set of “aliases” of all the members of N, so it is intuitively infinite; hence the 
axiom name is appropriate. Now why is the axiom “really true”? Because we can 
certainly construct each (real) set in the infinite sequence @, {0}, {G, {O}},..., 
for each integer depth of nesting n € N,i and put them all in a class. Now this 
class has the same size as the (real) set N (why?); hence it must be a set by 
the “‘size limitation doctrine” of Chapter III. Alternatively, we can say that since 
collection is “true” and N is a set, then ran(f) is also a set (II.11.28), where 
f is the function with domain N that for each n € N “outputs” the set in the 
sequence 4, {H},... that has depth of nesting of braces equal to n. 


Furthermore, by construction, ran(f) is inductive. 


It should be noted that the negation of the axiom of infinity is “no inductive sets 
exist”, not “infinite sets do not exist” (see Exercise VI.54). 


Finally, we should mention that it is known that Axiom V.1.3 is not provable 
by the axioms we have so far (again, see Exercise VI.54); therefore it is a 
welcome addition, being intuitively readily acceptable (and necessary). 


V.1.4 Lemma. /f .¥ is a nonempty family of inductive sets, then ().¥ is an 
inductive set. 


Proof. Easy exercise. 


V.1.5 Definition (The Formal Natural Numbers). We introduce a new con- 
stant, w, in the language of set theory by the explicit definition 


o= (Vex : x is an inductive set} 


+ To reach any set in the sequence that involves depth of nesting of { }-brackets equal to n, all we 
have to do is to write down a proof, of length n + 1, that starts with the statement “is a set” and 
repeatedly uses the lemma “since x is a set, then so is x U {x}” (by union and pairing) as x runs 
through @, {0},.... 
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We will call w the set of formal natural numbers (we will drop the qualification 
“formal” whenever there is no danger of confusing N and w). 

Members of w are called (formal) natural numbers. In our metanotation 
n,m,l,i, j, k —with or without primes or subscripts — default to (formal) natural 
number variables unless the context dictates otherwise. 

That is, we are introducing natural number typed variables in our 
argot. Thus, “(Wm)P[m]’ is short for “(Wx € w) SP [x]’. “(Aam)P [m]” means 
“Gx €o)PI[x]’. 


By the axiom of infinity, w is indeed a set (since {x : x is an inductive set} is 
nonempty). By Lemma V.1.4 it is itself inductive. Clearly, w is the C-smallest 
inductive set, for if A is inductive, then A € {x : x is an inductive set} and hence 
@ C A. This simple observation leads to 


V.1.6 Theorem (Induction over w). Let P(x) be a formula. Then 


PD), Ax (P(x) > P(x U{x})) F (Wx € @) A(x) 


Proof. Assume the hypothesis. Let B = {x € w : A(x)}. By separation, B 
is a set. The hypothesis (and the fact that w is inductive) implies that B is 
inductive. Hence (w is smallest inductive set), w C B. That is, x € w > 
x €w@A P(x); hence (Vx € w)A(x) by tautological implication followed by 
generalization. 


V.1.7 Remark. The induction over @ is stated in a more user-friendly way as 
“To prove (Vx € w) P(x) one proves 
(1) A) (this is the basis), and 
(2) freezing x, the hypothesis (induction hypothesis, or 1.H.) A(x) implies 
P(x U {x})” 
Note that (2) above establishes (Vx)(P(x) > A(x U {x})) by the deduction 
theorem (which uses the freezing assumption) followed by generalization. 
The process in quotes proves A(x) by induction on x (over w). x is called 


the induction variable. 
Applying the deduction theorem to V.1.6, one derives the ZFC theorem, 


PD) > Wx)(P(x) > P(x U{x})) > Wx € @) A(x) 


We develop a few properties of the formal natural numbers that we will need 
on one hand for our theoretical development, and on the other hand in order to 
make the claim that w is a formal counterpart of N more acceptable. 


© 
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V.1.8 Lemma. 7 U {n} 4 9 foralln € w. 


eum corresponds to “n + 1 4 0” on N, or ROB# axiom S1. © 


Proof. nen {n}. 


V.1.9 Lemma. 7 U {n} = m U {m} implies m = n for all n,m ino. 


This corresponds to “n + 1 = m+ 1 implies n = m” on N, or ROB axiom S2. 
The reader will note from the proof below that this result is valid for all sets 
n, m not just those in w. © 


Proof. Let instead =m = n (proof by contradiction, frozen m and n). As n is 
on the left hand side of 
nU{n} =mU {m} 


it must be on the right hand side too; hence, n € m. Similarly, m € n, and this 
contradicts the axiom of foundation (applied to {m, n}). 


V.1.10 Lemma. /fn € o, then eithern = % orn =m U {m} for some m € w. 


Proof. Let! P(n) =n =BV (Am € w)(n = m U {m}). Clearly, / AY), in 
pure logic, by axiom x = x and substitution. Assume next A(x) for frozen x 
(.H.), and prove A(x U {x}): 


Now /(x) entails two cases, x = @ and 7x = &. The first yields that x € w. 
The second allows the introduction of the assumption 


meoarvx=mU{m} 
where m is a new constant. Since is inductive, x € w. Thus, the logical fact 
EF xU{x}=xU {x} 
and the substitution axiom Ax2 yield 


F Am)\(m €wAx U {x} =m U {m}) 


and therefore + P(x U {x}) by tautological implication. 


V.1.11 Definition (Transitive Classes). A class A is transitive iffx € ye A 
implies x € A for all x, y. 


+ ROB stands for Robinson’s axiomatic arithmetic, studied in volume 1, Chapter I. 
= “=” means string equality (cf. 1.1.4, p. 13). 
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This concept was introduced, in passing, in a footnote of Chapter IV. It is of the 
utmost importance: not only are the ordinals (in particular the formal natural 
numbers) transitive sets — while the class of all ordinals is a transitive class — 
but also such sets play a major role in the model theory of set theory. 


Note that a class A is transitive amounts to ran(e [ A) C A, where, of 
course, we employ the symbol “ce” to denote the relation — {(y, x): x € y}- 
defined by the nonlogical symbol (predicate) also denoted “ce”. In words: For 
all inputs x € A, the relation € has all its outputs in A as well. Or, as we say, 
A is €-closed.' 


V.1.12 Lemma. Every natural number is a transitive set. 


Proof. We prove? 
(Vz)(Vx, v(x Eyez x €2Z) (1) 
by induction on z. 
Basis. x € y € § — x € Wis provable, since x € y € # is refutable. 
LH. For a frozen z assume 
(Vx, y(x €yez>x €2Z) (2) 
Let now x, y be frozen variables,’ and add the assumption x € y € z U {z}. 
Case y € z. Then (IH. and specialization) x € z C zU {z}. 
Case y = z. Thenx € z C zU {z}. By the deduction theorem, 
xeyezUu{z}oxe zu {z} 


Hence 


(Vx, yx eyezU{z}>xezU {z}) 


V.1.13 Lemma. @ is a transitive set. 


Proof. We prove 
(Vy)(Vx)(x €yenorx €o) (1) 


by induction on y. 


— 


99 6689 


The reader will recall that in ““y € x”, “x” is the input, for according to our conventions (x, y) is 
a pair in €. A sizable part of the literature has “y € x” to mean (y, x) is in €, ie., it has y as the 
input. Naturally, for them, a transitive class is not €-closed; instead it is e7!-closed. 

? We use the shorthand “(Wx, y)” for “(Wx)(Wy)” 

That is, we must remember not to universally quantify them or substitute into them prior to our 
intended application of the deduction theorem. 


wm 
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Basis. x € § € a — x € wis provable, since x € 4 € w is refutable. 


LH. Freeze y € w, and assume 
(Vx)\(x E€yeworx eo) (2) 


To argue the case for y U {y}, let now x be frozen‘ and add the assumption 
xeyUf{y}. 
Case x € y. Then (I.H. and specialization) x € w. 


Case x = y. Then x € a. 
Thus, we have proved (deduction theorem followed by generalization) 


(Vx)\(x Ee yU{fy}ewrx eo) 


ee above two lemmata say quite a bit about the structure of natural numbers: 


(1) Every natural number is a transitive set. 
(2) Every member of a natural number is a natural number (V.1.13). 


Add to this that 
(3) A natural number is a successor, or equal to @ (V.1.10), 


and we have a complete characterization of natural numbers that does not need 
the axiom of infinity anymore. (See Exercise V.5.) Well, we will need infinity 
sooner or later, and we will need induction and inductive definitions over w 
sooner rather than later, so it was not a bad idea to introduce the “whole” w 


now. © 


V.1.14 Example. Is w a successor? No, for if @ = x U {x} for some x, then 
x € w. Since w is inductive, x U{x} € was well, i.e.,@ € w, which is impossible 


by foundation. © 


V.1.15 Example (Predecessors). By Lemmata V.1.8, V.1.9, and V.1.10 the 
formulan € w> (Alm € o)\(n=O6Am=O6Vn -4 OAn=m U {m}) is 
provable in set theory. 

Thus we can introduce a function symbol of arity 1, pr, the predeces- 
sor function symbol, by the axiom (see III.2.4, p. 122, in particular, (10), (11), 


+ In argot one often says “let x be arbitrary but fixed”, referring to the “value” of x. 
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(19), (20)) 


prxaya=yo 
(x EWAG =DAy =BVx ADA =yUIy)) (1) 
Vx €woA y= 


Thus (aforementioned (19) and (20) respectively) 
FxrE€or>x=BA prix)=OVxABDAX = pr(x)U{pr(x)} 


and 


kx déo—> pr(x)=@ © 


V.1.16 Remark. By the above, whenever n 4M is a natural number, then its 
predecessor pr(n) is also a natural number such that pr(n) € n. By the transi- 
tivity of each natural number n, the predecessor of the predecessor of n (if the 
latter is not @) is also a member of n, and so on; thus each natural number n is 
the set of all natural numbers that “precede it” in the sense that “‘m precedes n 
iffm en”. 

This remark is important, yet trivial. Another way to see it is to note that 
n={x:x € n} is provable for any set n. Now if n € @, then so are all the 
x € n, so with the notational convention of Definition V.1.5 one can write 


n={m:m en}. © 


V.1.17 Theorem (The Minimality Principle for w, and for Any n € «). 


(1) Fora: If A is anonempty subset of w, then there isanm € A such that 
a(dn € A)jn em. 

(2) For any natural number k: If A is a nonempty subset of k, then there is 
anm € A such that —(An € A)n € m. 


@An element such as m is called an €-minimal element of A. © 
Proof. (1): Formally, we want to prove the theorem schema 
(Am) P[m] > (Am)(Plm) A 7(4n)(P [n] Ane m)) 
By the argot conventions of V.1.5, the above is short for 
(Ax\(x €WAP[x])> (Ax)(x EWOAP[X|A a(Ay)(y EWAS[L]AyE x)) 


which is provable (for any 7) by the axiom of foundation. 
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(2): Formally, since x €k <> x € aA x €k is provable by V.1.13, we just 
want to prove the theorem schema 


(am € kK) Pim] —> (am € (Alm) An EK(Plnl Ane m)) 
This translates to 


(Am)(m ek A Pim) > (Gm)(m EkAP|m]A7n)(nekAPInlAne m)) 


and is provable by part (1). 


V.1.18 Metatheorem (Relating @ and N). There is a 1-1 correspondence 
I:N > w- where “w” here denotes the “real” smallest inductive set — that 
“translates the successor on N to the successor on w”, namely, 


I(0)=G 
and, forn > 0,n €N, 


I(n+1)=I(n)U {I(n)} 


Proof. (In the metatheory.) Taking recursive (inductive) definitions over N for 
granted, a unique and total I, as defined by recursion in the statement of the 
metatheorem, exists. Let us prove its other stated properties. 

1-1-ness: By (metatheoretical) induction on n — m > 1 (n,m in N) over N 
we will prove that m <n — I(m) € I(n), hence m #£n > I(n) 4 I(m). 

Basis. If n — m = 1, then I(n) = I(m) U {I(m)}. 

IH. Assume the claim for n — m = k. Casen —m=k + 1: Now I(n) = 
I(m +k + 1), so that I(m + k) € I(n). By LH., Im) € I(m + k) so that 
I(m) € I(n), since the sets /(7) are transitive. 

Ontoness: By contradiction, let n € w be €-minimal such that n ¢ ran(J). 
Now,n 4 @ for@ € ran(/). Thus (V.1.15),1 = pr(n) U {pr(n)}. Since pr(n) En 
and n is minimal with the above property, pr(n) fails the property, that is, 
pr(n)=I(m) for some m € N. But then J(m+1) = n, hence n € ran(/); a 
contradiction. 


V.1.19 Remark. In Metatheorem V.1.18 we have established that the “real” 
structures (N;0, Ax.x + 1; <) and (w;@, Ax.x U {x}; ©) are isomorphic in a 


¥ See 1.2.13, p. 26, for justification in a general setting. We will consider their formal counterparts 
over @ shortly. 

= By correctness and soundness of ZFC, the real @ satisfies the minimality principle, i.e., Theo- 
rem V.1.17 is really true. 
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unique (uniqueness of J) and “natural” way. This is the well-known result of 
naive set theory that the order type of N is w. 


(1) The isomorphism / preserves the initial element (/(0) = 9), 

(2) it preserves the operation of successor (J(n + 1) = I(n) U {I (n)}), and 

(3) it preserves order (we proved above that m <n — I(m) € I(n)). 

(4) I uniquely names the formal natural numbers @, {@},...as 0, 1,...3 
moreover, 

(5) I, informally, assigns to each formal natural number its number of elements: 
It assigns 0 to @, which is correct, and if we assume that n € N correctly 
measures the number of elements in /(n) (induction over N), then n + | is 
the number of elements of /(n) U {/(n)}, for from I(n) ¢ I(n) it follows 
that 7(n) is a net new element added in passing from /(n) to I(n) U {1(n)}. 

(6) Further observing the “real” @ we extract one more piece of informa- 
tion: We know that < on N satisfies trichotomy, i.e., for any n,m in N, 
m<nVm=nVn<mistrue. We know that € does not satisfy trichotomy 
on Uy (think of an example); however, in view of the isomorphism J, we 
expect that the transform of < (that is, € restricted to the real w) does satisfy 
trichotomy. Indeed, continuing to argue in the metatheory, let m,n be inw 
such that m 4 n and let n’, m’ in N be such that J(n') = n, I(m') = m. By 
single-valuedness of J, n’ 4 m'. Then n’ < m’ orm’ <n’; hence n € m or 
m € n respectively. So (Vm,n)\(m €nVm=nVn € m) is true in (the 
real) w. 


By Gédel’s incompleteness theorem, there are really true sentences of the 
language of set theory that are not provable in ZFC. However, trichotomy 


(Vm,n\(menvVm=nVnem) 


is not one of those. That this “truth” is formally provable within set theory we 
will see shortly. First, let us summarize our position vs. w: 


© Hold on a minute! How do we know that trichotomy is true in N? It positively is 
the wrong reason to advance that this might be because St, the standard model 
of PA, of course satisfies the PA axioms. That puts the cart well in advance 
of the horse, since the formal PA is an afterthought, a symbol-shuffling game 
we play within real mathematics. That is, we may rightly be accused here of 
circular thinking: PA proves trichotomy just because we thought trichotomy 

was really true, and so we “fixed” the PA axioms to derive it as a theorem! 
Perhaps a more satisfying argument is that a real natural number indexes 
(counts) the members of a string of strokes, “|”. Thus, if we have two natural 
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numbers m and n, we can associate with them two strings, |...| and |... |, 
of the requisite numbers of strokes. We can think of such strings as the unary 
notations for those numbers. 

Now, “obviously”, if we superimpose the two sequences of strokes so that 
the two leftmost strokes in each coincide with each other, then either the two 
sequences totally coincide (case m = n), or the first sequence ends before the 
second (is a “prefix” of the second; case m <n), or the “otherwise” obtains 


a 2¢ 


Metamathematically the two structures 
(N;0,Ax.x +1;<) and (w;@,Ax.x U {x}; 6) 


are indistinguishable (which is pretty much what “isomorphic” means). This 
motivates the following behaviour on our part henceforth: From now on we will 
enrich the argot we speak when we do (formal) set theory to include: 


“set of natural numbers” to mean w; 
“natural number n” to meann € w 


(that is, we drop the qualification “formal’’). 


We will utilize the naming induced by the “external” (metamathematical) 
1-1 correspondence J without any further special notice, reserving the right to 
fall back to rigid notation (@,{@}, etc.) whenever we feel that the exposition 
will benefit from us doing so. Specifically, n + 1 stands for the successor 
n U {n} in the context of natural numbers — in particular, we can write n + 1 = 
nU {n}. 


n— 1 is another name for pr(n) whenever n 4 @; we write 0 for @, 1 for {G} 
and, in general, n for {0,1,...,n — l} ifn £0. 


We write < for € whenever we feel that intuition will benefit from this 
notation. Then, m <n,ie.,m <nVm=nism EnVm =n; inshort,m Cn 
due to the transitivity of n. 


In examples, and metamathematical discussions in general, we will continue 
to draw from our wider mathematical experience, and real sets such as N and 
R will continue being used and talked about. All we have done here is to find 
an isomorphic image of N in the real realm that is easily seen to have a formal 
counterpart within the formal theory. We are not saying that this (real @) is 
really the set of natural numbers rather than that (N), as such a statement would 
be meaningless (and pointless) mathematically. It is the job of the philosopher 
to figure out exactly what are the natural numbers. The set theorist is content 
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with using sets whose elements behave like natural numbers for purposes that 
include the ones articulated at the outset in this chapter. © 


V.1.20 Theorem (Trichotomy on w). 


Eze (Vn)\(Vm)\(m EnVm=nVnem) 


Proof. Recall V.1.5. We proceed by contradiction, so let us assume (or “‘add’”’) 
instead 


(dn)\Gm)\(-m En A7=m=n An € m) (1) 
By the minimality principle on w (V.1.17), let no be €-minimal! in w such that? 
(Am)(-=m € nop A =m = ng A No € mM) (2) 

Again, let mo be €-minimal such that 
7my € Np A 7Mp = No A 7No E Mo (3) 


We proceed to prove nyo = mo, thus obtaining a contradiction. 


Let x € no (which can also be written as “let i € no” in view of V.1.13). By 
minimality of no, (2) yields (Wm)(m € x Vm =x Vx € m), and by specializa- 
tion, 


mExVm=xVx EMO (4) 


Which v-clause is provable in (4)? Well, each of mp € x and mp = x yields 
mo € No (using transitivity of no, V.1.12), which contradicts (3). It is then the 
case that x € mg, which proves no C mo. 


The symmetry of (3) suggests (check it!) that an entirely analogous argument 
yields x € mp —> x € no and hence mo C no. All in all, we have both (3) and 
No = Mo, a contradiction. 


Because of trichotomy, €-minimal elements in § # A C w are unique, for if 
m # nin A are both €-minimal, then we get the contradiction (to V.1.20) 


—menA-nNEmMAAmM=nN 


+ The qualification “in w” can be omitted (indeed, it will be in the remainder of the proof) in view 
of the naming convention of V.1.5. 

= This uses proof by auxiliary constant, no, between the lines. The reader was forewarned at 
the beginning of Chapter IV that we will be increasingly using the “relaxed” proof style (see 
also TII.5.9, p. 148). 
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An €-minimal element m of A thus is also €-minimum (€-least, €-smallest) or 
just minimum (least, smallest) in the sense thatx € A> mexVm=x,or- 
to use our new argot—x € A — m < x. This is so by trichotomy, since x € m 


fails. © 


V.1.21 Theorem (Recursive Definitions over w). Given a set A and a total 
function g:@ x A— A in the sense of 11.11.12. There exists a unique total 
function f :@ — A that satisfies the following recursive definition: 


f() =a, whereae A (R) 
forn>0, fat+l=eg, fm) 

Proof. It is convenient to prove uniqueness first. Arguing by contradiction, let 

f' satisfy the identical recursive definition as f yet be different from f. Let m 

be the least such that f(m) + f’(m). By the basis of the definition (R), m 40, 

and hence m — 1 € wand f(m — 1) = f’(m — 1). But then, 


fim) = gm — 1, fim — I) 
= g(m — 1, f'(m—1)) 
= f'(m) 
contradicting the hypothesis. Uniqueness is settled. Indeed, the argument ap- 


plies unchanged for recursive definitions with a natural number n as domain 
(i.e., total f :m — A), since the minimality principle holds onn as well (V.1.17). 


We address existence next. 
ce Preamile: For the existence part one is tempted to argue as follows: 


Argument. (R) gives the value of f at 0, namely a. So, if we take the I.H. that 
Ff (m) is defined, then (R) (second equation) shows that f(n + 1) is also defined 
(since g is total). By induction over w, f is defined for alln € @, hence it exists. 


The above argument is drastically off the mark. All we really have argued 
about is that any f that happens to satisfy (R) also satisfies dom(f) =, or, 
“if an f satisfying (R) exists, then dom(f)= a”. Thus we have not proved 
existence at all. After all, a function f does not need to be fotal in order to 
“exist”. 

The correct way to go about proving existence is to build “successive approx- 
imations” of f by “finite” functions that satisfy (R) on their domain. Each of 
these finite functions will have as domain some natural number n € w— {0}. To- 
wards this purpose we relax the “for n > 0” requirement in (R). It turns out that 
these finite functions (if they exist) are pairwise consistent in that, for any two 
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of them, / and p, one has either h C p or p Ch. Thus the union of all of them 
is afunction f. With a bit of extra work we show that f is total and satisfies (R). © 


Let 


F= {f: f isa function A dom(f) € w — {0} A f(0) =aA 


1 
(Vk edom(f(k £0 > FH =sk-Lfe—D)} © 


A fact used twice Delow, is that .Y #%. For example, {(0,a)} € .F; also, 
{(0, a), (1, g(O, a))} € .F. The first of these two functions has domain equal 
to 1, the second has Pee equal to 2. © 


By the uniqueness part (applied to a function satisfying (R) on domain n), 
for each n € w — {0} there is at most one f € FY with dom(f) =n, hence, by 
collection (II.11.28),.F is a set. So is then 


FEUF 2) 


Observe first that fis a function: Let (a, b) € Fand also (a, c) € fi Then, by (2), 
f(a)=b and f'(a)=c for some f and f’ in.%. Without loss of generality, 
applying trichotomy, dom(f) € dom(f’).' By uniqueness, f = f’ [ dom(f), 
since both sides of = satisfy the same recurrence on dom(f). 

Thus, c= f’(a) = (f'[ dom(f))(a) = f(a) =b, and therefore f is single- 
valued. 

We next argue that f satisfies the recurrence: Trivially, f(0) =aby{(0,a)} € 
F. Now 


fintl)=f(n+1)  forsome fe.F 
= g(n, f(n)) . 
= g(n, f(n)) by f cf 


We finally show that fis total on w: Let instead m be least such that fin) les 
Hence, m#0 (since f(0) J, due to {(0,a)}e€.F) and f(m—1) |. Thus, 
fim—-1)= = f(m—1)forsome f €.F withdom(f) =m. Defineh :m+1,—> A 
by 


Ff) ifx em 


ne) = Le —1,f(m—) ifx=m 


Clearly h € ¥(dom(h) = m + 1); thus fim) = h(m), a contradiction. 


i If dom(f) = dom(f’) then f = f’ by uniqueness, and hence b= c. 
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Although we have used trichotomy in the existence part, this can be avoided. 
See Exercise V.4. 


We apply the theorem on recursive definitions to define addition on w. 


V.1.22 Definition (Addition of Natural Numbers). Define f,, : @ — o for 
each m € w by 


Sn (0) =m 


V.1.23 Remark. Of course, foreach m, f,, isa set by separation (fi, C wx @). 
The class 


{((m,n), fin(n)) : (m,n) € w*} (1) 


is a set by collection (III.8.9). It is also single-valued in the second projection, 
for (m,n) =(m’',n') implies m =m’, and hence (uniqueness — V.1.21) also 
Sn = fm’. Finally, this implies f,,(2) = fi (’), for our assumption yields n =n’, 
and the f,, are functions. 

We will call the set (1) “+”, as tradition has it. That is, + :@ x w@ > w 
satisfies (i.e., we can prove in ZFC) the following: 


m+0=m 
mt+(n+1)=(m+n)4+1 


In form, as it was dictated by intuition, the recursive definition of addition on 
@ is identical to the recursive definition of addition on N (only the interpretation 
of the nonlogical symbols changes). Naturally we expect addition on w to enjoy 
the same properties as that over N. 

We note that the “+” just introduced is an informal abbreviation for a subset 
of w* (given by the set term (1) above) that also happens to be an important 
function. If we wish (we do not, however), we can also introduce a new formal 
function symbol, say, “f,” by the explicit definition 
x+y ifxewAyEeo 


f+ @, y= fe 


otherwise 


The reader will recall that we bow to the requirement that formal function 
symbols be total functions upon interpretation (e.g., over the standard model, 
which here has as underlying “set” the proper class Uy). This is the reason for 
the “otherwise” part. Trivially, 


Fypcx Ea fi(x,0)=x 


© 
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and 


tyre xe€o> yew fi(x,yU{y}) = fA, YUL, y)} © 


V.1.24 Proposition (Commutativity of Addition). 


Ezpc (Vn)(Vm)\(m +n =n+m) 


Proof. We do induction on n. 


Basis. n= 0. We want to prove 


(Vm)(m +0=0+m) (2) 


Anticipating success, this will also entail (by commutativity of equality and the 
Leibniz rule) 


(¥m)(0 +m =m+0) (2’) © 


Now zgc (¥m)(m + 0 = m) by V.1.23. It suffices to prove 
(¥m)(0 + m = m) 
Regrettably, this requires an induction on m: 


Basis. m=O. That'zpc 0+ 0 = 0 follows from V.1.23, which settles the 
basis (of the m-induction). 


Let (I.H. on m) 0+ m = m for frozen m. Then (using “=” conjunctionally 
throughout this proof) 
0+(m+1)=(0+m)+1 by V.1.23 
=m+1 by LH. on m 
This finally settles the basis (for 7), namely, (2). 
Assume now (I.H. on 7) that 
(Vm)\(m +n =n+m) (3) 


with frozen n. We embark on proving 
(Vm)(m + (n+ 1) = (n+ 1) +m) (4) 
We prove (4) by induction on m. 


Basis. For m = 0 we want 
O0+(@+1=(+4+1)4+0 


which is provable by (2’) above via specialization. 
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We take now an I.H. on m, that is, we add the assumption 
m+(n4+1)=(n4+1)+m (5) 


with frozen m (n was already frozen when we took assumption (3)) 
We embark on the final trip, i.e., to prove 


(m+)+@+)=@4+1)4+™m+1) 
Well, 


(n+ )D+(m+)=(2+)D+m)+1_ by V.1.23 
=(m+(n+1))+1~ byILH. onm (5) 
=((m+n)+1)4+1_ by V.1.23 
=(n+m)+1)+1-— by LH. onn: (3) and specialization 
=(n+(m+1))+1 by V.1.23 
=((m+1)+n)+1- by LH. onn: (3) and specialization 
=(m+1)+(@+4+1)_ by V.1.23 


(4) is now settled. 


The reader has just witnessed an application of the dreaded double induction. 
That is, to prove 


(Vn)\(¥m)P(m, n) (6) 


one starts, in good faith, an induction on n. En route it turns out that in order to 
get unstuck one has to do an induction on m as well, towards proving the basis 
(¥m)P(m, 0) and the induction step (¥m)P(m, n) > (¥m)P (m,n + 1). 


The good news is that it is not always necessary to do a double induction in 
order to prove something like (6). See for example the proof of the next result. © 


The reader can prove the associativity of + (Exercise V.6), which we take 
for a fact from now on. Since, intuitively, n = {0, 1,...,2—1}, then, intuitively, 
n+m = {0,1,...,.n—1,n+0,n+1,...,n+(m-— 1)}. That is, to obtain 
n-+m we “concatenate” to the right of n the elements of m “shifted” by n. 
Formally this is true: 


V.1.25 Theorem. 


uae (¥ny(¥m)(n +m=nU(ntisie m}) 
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Proof. By generalization, it suffices to prove 
(Vm)\n+m=nU {n+i:i €m}) 


by induction on m. 


Basis. For m = 0 (i.e., m = @) the claim amounts to n + 0 = n UG, while, 
trivially, -zpc n U@ =n. Done by V.1.23. 


I.H. Assume 
n+m=nU{n+i:i em} 
for frozen m and n. We look at the case m + 1: The left hand side is 


n+(m+1)=(n4+m)+1 by V.1.23 


=(n+m)U{n+m} (expanding “+1”’) (1) 


The right-hand side is 


nU{n+i:iemU{m}} =nU{n+i:iemsU{fn+m} 
=(n+m)U{n+m} by LH. 


By (1), we are done. 


V.1.26 Theorem. 


F-ZEC (¥ny(¥m)(n <m—> (Alin+i= m) 


Proof. Existence. We argue by contradiction, so we add the assumption 
(Any(am)(n 2mAsGin +i = m) 


Let then — invoking proof by auxiliary constant between the lines, as in the 
proof of V.1.20 — no be smallest such thatt 


m)(no LaF tae m) (1) 
and next let mo be smallest such that 


Ng Smo A AGi)ng + i = Mo (2) 


1 That is, “add the new constant no and the assumption (1) along with k <ng > a(Am)(k <mA 
a(4i)k +i =m)”. 
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Because of zpc no + 0 = no, (2) yields ~mo = no: otherwise (i.e., adding the 
assumption mo = 9) we get zpc No +0 = mo and hence! zc (di)no +i = mo 
by the substitution axiom (Ax2), contradicting (2). 

Thus, from (2) (first conjunct, via V.1.19, p. 242), zac no € mo; hence 
Fzpc mo # 0. It follows that mp = pr(mo) U {pr(mo)} (V.1.15); therefore 
Ezpc no < mo — 1 (V.1.19, p. 242). By minimality of mo, there is an (auxiliary 
constant) i such that np +i =mo — 1; hence we have a ZFC proof of my = 
(no +1) + 1 = no + @ + 1) (employing “=” conjunctionally), contradicting 
the property (2) of mo. 


Uniqueness. Let no be smallest such that 
noti=not+j (3) 


for some i 4 j7. Now no £0, for Fzp¢ 1 =O +i and Fzpc OF J = 7 (V.1.24). 
Thus we can write (3) (using commutativity) as (i.e., we can prove) 


(no —-1I+itl=m-N)+j+1 


Since i + 1 4 j + 1 (Lemma V.1.9), we have just contradicted the minimality 
of No. 


V.1.27 Definition (Difference of Natural Numbers). By V.1.26, the relation 
over w below — which we denote by the informal symbol “—”, 


{((m,n),i):m=n+i} 
is single-valued in the second projection, that is, a function 
-:0>-o0 
in the sense of III.11.12, called difference. By V.1.26, 


Kzrcn<m—>m—ny 


and 
bopcn<m—>m=n+(m—n) 
while 
Hzracm<n—>m—nt 
(1) We are painfully aware of the multiple meanings of the symbol “—” in 


set theory as set difference and, now, natural number difference, but such 
“overloading” of symbol meaning is common in mathematics. 


V1. The Natural Numbers 251 


(2) The difference between natural numbers does not coincide with the set 
difference of the two numbers. For example, in the former sense, 2 — 1 = 
1 = {0}, while in the latter sense 2 — 1 = {0, 1} — {0} = {1}. The context 
will alert the reader if we (ever) perform m — n in the “set sense” rather 
than the (normally) “natural number sense’”’. 

(3) Number difference is consistent with the earlier introduction of “—” in the 
context of predecessor. In the former sense, ifn > 1, thenn—1 is the unique 
number m such that m U {m} =n, i.e.,m + 1 =n; that m is precisely the 
predecessor of n. 


V.1.28 Definition (Finite Sequences). A finite sequence is a function f such 
that dom(f) € a. 

If dom(f) = 0, then f is the empty sequence. The length of the sequence f 
is dom(f). If i € dom(/), then f(i) is the ith element of the sequence. 


Intuitively, a sequence f is [f(0),..., f(m — 1)] where n = dom(f) 4 0. 
If dom(f) = 0, then we have the empty sequence [ ]. 


We wish to distinguish between vectors and finite sequences although in a sense 
the two work the same way. They both give “order information” for a (finite) 
set. The technical differences are as follows: 


(1) The vector (a, b) is the set {a, {a, b}}, while the sequence [a, b] is the 
set {(0, a), (1, b)} = {{0, {0, a}}, {1, (1, b}}}; thus they are different as 
sets. 

(2) The vector (x1,...,X,) has the informal n in its name, so that n cannot 
be manipulated by the formalism.* Thus (x;,...,x,) is much like a set 
of unrelated variables X1,..., Xn in a programming language, while a 
sequence f = {(0, x1),..., (7 — 1, x,)} not only gives the same positional 
information, but also behaves like an array f(i) fori = 0,...,n —lina 
programming language; for the i in f(i) has formal status. 


We often need to juxtapose or concatenate two sequences f and g to obtain 


[f(),..., f(v — 1), g(0),..., g(m — 1)] where dom(f) = n, dom(g) = m. 
Intuitively, the concatenation of f and g, in that order, is a sequence, for we 


¥ One can of course revisit the definition of (.. .) and redefine it in terms of the formal numbers n. 
Such rewriting of history will be unwise in view of the commotion it will create. As it stands we 
are doing fine: The original definition allowed the theory to bootstrap itself up to a point where 
the present more general and more flexible definition of sequence was given. 
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can write it as [f(0),..., f(m — 1), g(m + 0),...,@(n + (m — 1))], where 
e(n +i) = g(i) for all i € dom(g) and undefined otherwise. That is, the 
concatenation is the function f Ug. 


V.1.29 Definition (Concatenation of Finite Sequences). If f and g are finite 
sequences, then the relation over w given by 


FU {(dom(f) + 1, g@)) : i € dom(g)} 


— and denoted by f * g —is called their concatenation, in the order f followed 
by g. 


V.1.30 Proposition. For any two finite sequences f and g, f * g is a finite 
sequence of length dom(f) + dom(g). 


Proof. Observe that dom(f) € dom(f) + i (by V.1.25); hence dom(f) +7 ¢ 
dom(/). In other words the domains of f and {(dom(f)+i, g(i)) : i € dom(g)} 
are disjoint. 

Thus, f « g is a function. Its domain, in view of the previous comment, is 
dom(f) U {dom(f) +7 : i € dom(g)}, which is dom(f) + dom(g) by V.1.25. 


V.1.31 Corollary. If f is the empty sequence and g is some sequence, then 
fegaerxf=s. 


ef xg # g * f in general: Exercise V.10. 


V.1.32 Proposition. zc —n < 0. 


Proof. That is, Fzpc an € @. 


V.1.33 Proposition. Fzx>¢pn <m+len<mVn=m. 


Proof. Thatis, -zrcen emU{m}onemvn=m. 
Some more arithmetic over w is delegated to the Exercises section. 


© The reader who has read volume 1, Chapter I, now armed with V.1.8, V.1.9, 
V.1.23, Exercise V.11 (which introduces multiplication over w), V.1.20, V.1.32, 
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and V.1.33 along with the induction principle over w, will see with a minimum 
of effort or imagination that the Gédel incompleteness theorems hold for ZFC — 
a fact that we took as a given in many earlier discussions. 

Indeed, one need only carry out the formal Godel numbering of volume 1, 
Chapter II, within ZFC (rather than within PA) using terms ¢ of ZFC that 
(provably) satisfy t € w as Gédel numbers of formulas, terms, and proofs in 
(any extension of ) Le. In this endeavour the proved results (for w) that we 
enumerated above — suggesting them as appropriate ammunition — play the role 
of ROB and induction, which — in arithmetic — were assumed axiomatically in 
volume 1. 

Moreover, if I denotes the set of individual ZFC axioms,‘ then it is easy 
to prove that the corresponding formula I(x) is recursive. Everything else has 
already been done in the aforementioned chapter. 
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V.2.1 Informal Definition. Let P and S be two relations. Then P o S, their 
composition in that order, is defined by 


yPoSx stands for (az\(yPzAzSx) 
or, equivalently, 
y €(PoS)(x) abbreviates (Az € S(x))y € P(z) 


We are adopting the notational convention that “P o S(x)” means “(Po S)(x)”, 
that is, we render the use of brackets redundant. 


V.2.2 Remark. Intuitively, y € Po S(x) iff there is a “stepping stone”, z, such 
that S sends x to z and P sends the latter to y. 

If S is a function, then y PS(x) (that is, (S(v), y) € P — the reader may 
wish to review notational issues; see III.11.4 and II.11.14) means, by Defini- 
tion [II.11.16, (d4z)(z = S(x) A y Pz). This says the same thing as y PoS x by 
Definition V.2.1 (remembering that z = S(x) iff zS x for functions — II.11.14). 


V.2.3 Lemma. For any relations P and S and all x, P o S(x) = P[S(x)]. 


1 Recall that separation and collection denote infinitely many axioms, and so does foundation in 
the form we have adopted, although the latter can be replaced by a single axiom. 
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Proof. C: Let y € Po S(x). Then, for some z, y € P(z) A z € S(x).? That is, 
y € P[S(x)] (Definition IT.11.4). 

: Let y € P[S(x)]. Then, for some z € S(x), one has y € P(z); hence 
y € PoS(x) by V.2.1. 


IU 


V.2.4 Corollary. For any relations P and S and all X, P o SEX] = P[SEX]]. 


Lemma V.2.3 justifies — at long last — the convention of writing “y Px” for 
“(x,y) € RP”. 

Of course, the “standard” convention is to write instead x P y, but it has a 
serious drawback: For functions f, g viewed as special cases of relations — thus 
using notation acceptable for relations — x f o g y would mean that x is input 
for f whose output is input to g; the latter yields y. In short, y = g(f(x)). 
“Standard” notation goes a step further to write y = go f(x), thus introducing 
the well-known (from elementary discrete mathematics texts) “reversal” from 
f ° g (when we are composing f and g “viewed as relations”) to g o f (when 
we are composing f and g “viewed as functions”). 

On the other hand, at the cost of being a bit unconventional at the outset, 
when y Px was defined, we proposed a convention that holds uniformly for 
relations and functions when it comes to composition. This (by Lemma V.2.3) 
says “in yPo Sx, S acts first, on input x; one of its outputs, if it is inputed to 
P, will cause output (possibly among other things) y”. © 


Ov25 Example. Let R = {(1, 2), (1, 3)} and § = {(1, 1), (2, 1)}. Then 


xSoRy iff (Az)(xSz A zRy) 
iff (Az)(y,z) € RA (z,x) € S) 


Thus So R = {(1, 1)}. On the other hand, one similarly calculates that Ro S$ = 
{(1, 2), (1, 3), (2, 2), (2, 3)}. 
Therefore, in general, /zrc SoR=RoOS. © 


V.2.6 Lemma. For any relations P, S, T 
Po(oT)=(PoS)oT 


that is, composition is associative. 


+ The reader who may long for the earlier tediously formal proof style will note that “z” can be 
thought of as the name of an auxiliary constant here. 
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Digression. By the (informal) definition of class equality (cf. I[I.4.7 and II.4.8) 
and the deduction theorem, a formula A = B is proved by “letting x € A” 
(frozen x) and then proving x € B to settle “C”’, subsequently repeating these 
steps with the roles of A and B reversed. We have already employed this tech- 
nique in the proof of V.2.3. 

When we deal with relations P and S and we want to prove P = S, the above 
technique translates to “letting” x P y in the “C-direction”. The reason is that 
one really “lets” 


zeP (1) 
Then (cf. HI.11.1), OP(z) follows, that is, 
(du)(Av)(u, v) = z (2) 


Letting now x and y be auxiliary (new) constants, we can add the assumption 
(y, x) = z so that (1) becomes 


xPy (3) 
With some work, one then proves x S y, that is, z € S. This settles P CS. 
Thus, in practice, one is indeed justified in suppressing the steps (1)-(2) and 
start by “letting” (3). 
Proof. C: Let xP o (So T)y. Then (by V.2.1) 
(az\(x Pz A zSo T)y) 
Hence (by V.2.1) 


(Az)(x PzA (u\(zSwaAw T y)) 


Hence (w is not free in x P z) 


(dw)Gz)aPzAzSwAwTy) 


Hence (z is not free in w T y) 


(aw) ((Az)(« Pz AzSw)AwTy) 
Hence (by V.2.1) 


(Aw) ((xP o Sw) A w Ty) 
Hence (by V.2.1) 


x(P oS)o Ty 


The case for > is entirely analogous. 


© 
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In view of the associativity of o, brackets are redundant in a chain of composi- 
tions involving the same relation P. In particular, 


Po---oP 
ee 


n copies 


depends only on P and the number of copies, n. It should be naturally denoted 
by P”. Here we have been thinking in informal (metamathematical) terms, i.e., 
neN. 

We can say the same thing by an informal inductive definition (over N): 


V.2.7 Tentative Definition (‘“‘Positive” Powers of a Relation). Let P be any 
relation. Then 


p! & p 
prt def 


P"oP for any n € N such that n > 0 


This tentative definition is acceptable, but it has the drawback that it hides 
n in the name, as we have already discussed in the preamble of Section V.1. 
We can fix this easily, if P is a set, by making V.2.7 into a formal recursive 
definition of a function n +> P” on w,' replacing “n € N” by “n € w” 4 

However we want to afford our exposition the generality that P may be a 
proper class. Intuitively, x P” y (n € w and n ¥ 0) should mean that for some 


sequence [ fo; fis ae) Sal, 
foP fiP---P fri P fn 
where x = fo and y = fy. @ 


V.2.8 Definition (‘‘Positive’’ Powers of a Relation). For any relation P (pos- 
sibly a proper class), and any n € w — {0}, the relation P” is defined by 


xP" y abbreviates néw — {0} A (af y(f is a function A dom(f)=n+1A 
fOH=xA fM=VAW |G <a> fPPFG+DV) 


<The reader already knows how to express “f is a function” within set theory. © 


t Technically, we then also need a meaning for P°, ie., a value of the defined function at 0. As 
such we can take, for example, {(x, x) : x € field(P)}. 

= The requirement that P be a set makes the pairs (n, P”) of the recursively defined function 
meaningful, for the two components of a pair must be sets or atoms. 
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V.2.9 Definition (Relational Identity). The identity or diagonal relation on 
the class A is 1, : A — A (also denoted A, : A — A) given by 1, = 
{(x, x) : x € A}. Thus, 1, is a total function A > A. 

If A is understood, then we simply write 1 or A.t 


For any P: A > A we let P® abbreviate 1,. 


ene people define P® as {(x, x) : x = x}, but we prefer to make P® context- 
sensitive. We note that P” forn > 0 does not depend on the context A in which 
P was given (as P: A > A). 
Thus, if we have a relation R on a set A, then R° is the set {(x, x) : x € A} 
rather than the proper class {(x, x) : x = x}. 


Pause. Why is {(x, x) : x = x} a proper class? © 


V.2.10 Example. Let A = {1, 2, 3}. Then 1, = Ag = {(I, 1), (2,2), (3, 3)}. 


V.2.11 Lemma. For each P: A > A, onehasPoA=AcP=P. 


Proof. We have 


y € Po A(x) 
iff (Lemma V.2.3) 
y € P[A(x)] 
iff (Definition V.2.9) 
y € PI{x}] 
iff 
y € P(x) 


Thus Po A = P. Similarly, A o P = P. 


V.2.12 Proposition. For any relation P. 


tzpc P! = P 
tzpc P"t! = P"oP for anyn € w— {0} 


1 The context will guard against confusing this A with that of volume 1, Chapter II. 
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Proof. By V.2.8, x P!y abbreviates 


af y(f is a function A dom(f) = 2A 


1 
fO=LAPM=yawNy <1> fHMPFUt+D) 


Since Fzpc j < 1 <> j = 0 (recall that “j < 1” means “j € {#}”), the one 
point rule (1.6.2) yields at once that (1) is provably equivalent to the following: 


af y(f is afunction A dom(f) = 2A 


2 
fO=xA fl) =ya fOP Fv) (2) 


(2), in turn, is provably equivalent to x P y. 
To see the truth of the last claim, in view of (2) we introduce a new constant 
fo and the assumption 


fo is a function A dom( fo) = 2A 


3 
fol0) =x A fol) = yA foO)P fol) ; 
the last three conjuncts of which yield x P y. 
Conversely, assuming x P y, we get trivially 
{(0, x), (1, y)} is a function A dom({(0, x), (1, y)}) =2A 
= a (4) 
xX=xAy=yAxPy 
which yields (2) by the substitution axiom Ax2. This settles P! = P. 
Next, assume 
n>0 (5) 
and 
xP" y (6) 


These imply (by V.2.8) 


n >or ans is afunctionA dom(f) =n +2A 
SO=XA fat D=yAW H(i <n+1> SUP SU +D)) 
(7) 
Note that x P"*! y (cf. V.2.8) contributes the redundant (by V.1.8; hence not 


included in (7)) conjunct n + 1 > 0. By V.1.33 and employing tautological 
equivalences, distributivity of V over A, and the one point rule, (7) is provably 
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equivalent to 


n> 0AGA(S is a functionA dom(f) =n+2A fO) =x A frat 1) 
=yAW (i <n—> fDPFG+D)A FMP LH +D) 
(8) 
(8) allows the introduction of a new constant h and of the accompanying 
assumption 


his a function A dom(h) =n+2Ah0)=xA 


ae ; . 9 
(Wi)(j <n > h(j{)PhG + 1)) AAM)Py ©) 
or, setting g = Af (n+ 1) — which implies g(n) = h(n) in particular — 


gis afunctionA dom(g) =n+1Ag(0)=xA gin) =h(n)a 

Wii <n g{)Psit+D) Ah@Py 
which, by (5) and the substitution axiom, yields x P” h(n) P y. Hence x P" oP y. 
The reader will have no trouble establishing the converse. 


Proposition V.2.12 captures formally the essence of Tentative Definition V.2.7 
for any P (even a proper class), avoiding the technical difficulty (indeed, im- 
possibility) of defining a proper-class-valued function. 

We observe that for a relation P on A, Definition V.2.9 is in harmony 
with V.2.12 in the sense that 


per! = p! 
=P by V.2.12 
and 
P°oP=AoP 
=P by V2.11 


so that in this case Kzpc P"t! = P* o Pforn > 0. 


V.2.13 Lemma (The Laws of Exponents). For any P : A > A and j,m in 
Q, 


(a) EzRC Pp o P™= pit 
(b) Fzrc (P" = PM 


1 For multiplication of natural numbers see Exercise V.11. 


© 
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In the statement of the lemma, as is the normal practice, we use “implied multi- 
plication”, i.e., “mj” means m - 7. We will also follow the standard convention 
that “-” has smaller scope (or higher “priority’’) than “++”, so that m + nj means 


m + (nj). 


Proof. (a): Induction on m. The case m = 0 requires the provability of P/ o1 = 
P/, which is indeed the case by Lemma V.2.11. Now for the case m + 1 via the 


66 


m-case using conjunctionally: 


Pio P™1 = Pio (Po 


P) (by V.2.12) 


= (P/o P")oP (by associativity) 


= pin, p 
aa pitm+l 


(b): Induction on j. The case j =O r 
For the case j + 1, via the j-case, 


(p”)i+! = (P”)/ o p@ 


= mmnj +m (by 


= puy+) (by Exercise V.11) 


(by LH.) 
(by V.2.12) 


equires that (P”)° = P° (i.e., 1 = 1). 


(by Proposition V.2.12) 


Po P™ = ~— (by LH.) 


case (a)) 


By V.2.13(a), FzFc Pio P“= P” o PY for any P: A > A and j, m inw. That 
is, powers of a relation commute with respect to composition. 


V.2.14 Definition. A relation P is 


(a) symmetric iff for all x, y, x Py implies y Px. 


(b) antisymmetric iff for all x, y,x Py A y Px implies x = y. 
(c) transitive iff for all x, y, z,x Py A y Pz implies x Pz. 
(d) irreflexive iff for all x, y, x P y implies x # y. 


(e) reflexive on A iff Wx € A)x Px. 


(1) It is clear that (a) above says more, 
y Px”, for the names x, y can be int 


namely “P is symmetric iff x Py > 
erchanged in the definition. 


(2) All concepts except reflexivity depend only on the relation P, while reflex- 
ivity is relative to a class A. If P : A > A and if it is reflexive on A, we 


usually say just “reflexive”. 


Reflexivity on A clearly is tantamount to Aa C P. 


V.2.15 Example. 9 is all of (a)—(d). It satisfies (e) only on 9. 


© 


© 
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V.2.16 Example. A, : A > A isall of (a)-(c). It fails (d). It satisfies (ec) on A. 
Two obvious examples of irreflexive relations are C on classes and < on a, 
or, in the real realm, N. 


V.2.17 Example (Informal). The congruence modulo m on Z is defined for 
m > 1(m € Z) by 


a =m b iff m\|a—b 


where “x | y” means that (4z)(x -z = y). 

In number theory, rather than “... =,, ...” one uses the “dismembered” 
symbol “... =... (mod m)”. 

We verify that =,,: Z — Z is reflexive, symmetric, and transitive but not 
antisymmetric./ 

Indeed, m | a—a foralla € Z, which settles reflexivity. m |a—b > m|b—a 
for all a, b settles symmetry. For transitivity we start with m |a—bandm |b—c, 
that is,a —b=km and b—c=rm fork andr in Z. Thus,a—c = (k+r)m; 
therefore a =,, c. 

To see that antisymmetry fails, just consider the fact that while 0 =,, m and 
m =, 0, stillO 4 m. 


V.2.18 Example (Informal). <: N— N (“less than or equal” relation) is re- 
flexive, antisymmetric, and transitive, but not symmetric. For example, 3 < 4 
but 4 ¢ 34 


V.2.19 Example. Let R = {(1, 2), (2, 1), (2, 3), (3, 2): (1, 1), (2, 2), (3, 3)} 
on the set {1, 2, 3}. R is reflexive and symmetric, but is not antisymmetric or 
transitive. 


V.2.20 Proposition. P is symmetric iff P = P~! (cf. 1.11.4). P is transitive 
if P? CP. 


Proof. For symmetry: If part. Let P = P~! and x Py. Then x P~! y as well; 
therefore y P x, so that P is symmetric. 

Only-if part. C: Let P be symmetric and x P y. It follows that y P x; therefore 
xP! y (by the definition of P~'). 


¥ It goes almost without saying that no relation can be irreflexive and reflexive on a nonempty 
class. 

Tt is usual, whenever it is typographically elegant, to denote the negation of...P..., ie., 
a...P..., by... P..., for any relation P. 
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This settles C. The case for > is entirely analogous. 


For transitivity: If part. Let P? C P and x P y Pz.t Thus (definition of “o” 
x P? z; hence x Pz, so that P is transitive. 


Only-if part. Assume transitivity and let x P? y. Thus (definition of “o 
x PzP y for some z (auxiliary constant). By transitivity, x P y; hence P? C P. 


V.2.21 Example. For any relations P and S, zpce (PU S)!'=P lust. 
Indeed, let (x, y) € (P US)~!. Then 


(y,x) €PUS 
Hence 
(y,x)E€ PV (y,x) ES 
Hence 
(x,y) €P'v (x,y) € s 
Hence 


iy) ert us* 


This settles C. The argument can clearly be reversed to establish >. 


V.2.22 Definition (Closures). Given P. 


(1) The reflexive closure of P with respect to A, ra(P), is the C-smallest relation 
S that is reflexive on A such that P C S. 

(2) The symmetric closure of P, s(P), is the C-smallest symmetric relation S 
such that P C S. 

(3) The transitive closure of P, t(P), is the C-smallest transitive relation S such 
that P C S. The alternative notation Pt is often used to denote t(P). 


@s is C-smallest such that .Y holds” means that if .F holds also for T, then 
SCT. 


¥ In analogy with the conjunctional use of “<” in x < y < z, one often uses an arbitrary relation 
P conjunctionally, so that x P y Pz stands forx Py A y Pz. 
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In anticipation of the following lemma, we used “the” as opposed to “a” in 
Definition V.2.22. © 


V.2.23 Lemma (Uniqueness of Closures). Let S, T be two q-closures of P of 
the same type (q € {ra, 5, t}). ThenS = T. 


Proof. Let S pose as a q-closure of P. Then S C T. 
Now let T pose as a q-closure of P. Then T C S. Hence S = T. 


V.2.24 Lemma (Existence of Closures). Given a class A and a relation P. 
Then 


(a) ra(P) = PU Aa, 
(bo) sP)= PUP, 
() PP= USP. 


Pt = US, P’ is, of course, to be understood as an abbreviation of x Pt y 


(dic ai >OAxFP'y). © 


Proof. (a): P C PU Ag, and PU Aa is reflexive on A. Let also T be reflexive 
on A, and P C T. Reflexivity of T contributes A, C T. Thus, PU Ag CT. 
So PU Ag is C-smallest. 


(b): Trivially, P C PUP!. By V.2.20 and V.2.21, PU P~! is symmetric 
and hence a candidate for s-closure. Let now P C T where T is symmetric and 
hence T = T~!. Thus P-! C T~! = T, from which PU P~! C T. Done. 


(c): Now P C U32, P? by V.2.12. 
Next, we argue that (2, P’ is transitive. 


Let x(U2, P’)y and y(U2, P’)z. That is, x P/ y and y Pz for some (auxi- 
liary constants, cf. remark prior to this proof) j, m (= 1), from which follows 
(by V.2.13) x P/*” z; hence (j +m > 1 is provable, clearly) x((J32, P’)z, which 
settles transitivity. 


To establish Ie ae P' as C-smallest, let T be transitive and P C T. We claim 
that j > 0 > P/ C T. We do (formal, of course) induction on j: 


Basis. For j =0 the claim is vacuously satisfied, since then 0 <j is 
refutable. 
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We assume the claim for frozen j and proceed to provet 
PCT (1) 


There are two cases: 


Case j+1=1. Then weare done byt P! = P(V.2.12) and the assump- 
tion P C T. 
Case j+1>1. Thust j > 0. Therefore we have a proof 


x pit! y 


(Az)(x Pi z A zPy) (v.2.12) 


(az\(xaTzAzTy) (LH. and assumption on T| 


xTy 


SS 


is transitive) 


which proves the induction step. J;<, P! C T follows at once. 


V.2.25 Example. Why does a class A fail to be transitive? Because some set 
x € A has members that are not in A. If we fix this deficiency — by adding to 
A the missing members — we will turn A into a transitive class. All we have to 
do is to iterate the following process, until no new elements can be added: 


Add to the current iterate of A — call this A; — all those elements y, not 
already included, such that y € x € A, for all choices of x. 


So, if we add a y, then we must add also all the z € y that were not already 
included, and all the w € z € y that were not already included, . . . . In short, we 
add an element w just incase w € z€ y €---€ x € A for some z, y,..., x. 
With the help of the transitive closure — and switching notation from “e the 
nonlogical symbol” to ‘“‘e, the relation {(x, y) : y € x}! — this is simply put as: 


“Add a w just in case w €+ x A x € A for some x”, or 
“Add to the original A the class €* [A] —ie., form AU €7 [A].” 


It turns out that AU €* [A] is the C-smallest transitive class that has A as a 
subclass — that is, it extends A to a transitive class in the most economical way 


(see below). © 


1 bgEe j+1> 0 anyway. 
} The rightmost “e” is the nonlogical symbol. 
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V.2.26 Informal Definition (Transitive Closure of a Class). For any class A, 
the informal symbol TC(A), pronounced the transitive closure of the class A, 
stands for (abbreviates) AU €* [A]. 


V.2.27 Proposition. 


(1) TC(A) is the C-smallest transitive class that has A as a subclass. 
(2) IfA C TC(B), then TC(A) C TC(B). 
(3) If A is a set, then so is TC(A). 


Proof. (1): Trivially, A C TC(A). Next, let x € y € TC(A). 

Case 1. y € A. Then x € (€ [A]) € (e€* [A]),! since? € is a subclass of 
et. Hence x € TC(A). 

Case 2. y €€t [A]. Say y €€! [A] for some i € w — {0}. Then x € 
€ [e! [A]]. Moreover, we have the following simple calculation, where we 
have used the leftmost predicates in each line conjunctionally. 


x € €[e! [A]] 

= e€oe' [A], by V2.4 
er TAL: “by V.242 
e* [A] 
TC(A) 


Thus 7 C(A) is transitive. 


Finally, let A C B and B be transitive. Let x € ¢€* [A]. As above, x € 
€! [A] for some i € w — {0}. We want to conclude that x € B. For variety’s 
sake we argue by contradiction, so let ig > O (auxiliary constant) be smallest 
such that for some xo (auxiliary constant) the contention fails, that is, add 


xo € € [A] A 71x9 € 


Now io 4 | is provable, for, if we add ig = 1, then x9 € e/0 [A] means that 
(Ay)(xo € y € A); hence (Ay)(xo € y € B) by hypothesis. Thus, x9 € B, since 
is transitive, and we have just contradicted what we have assumed about 
membership of xo in 

Thus, i =ig — 1 > 0, and, by minimality of io, 


(Wy\y €€' [A] > ye 


wm 


(*) 


1 Brackets are inserted this once for the sake of clarity. They are omitted in the rest of the proof. 
¥ Tt is rather easy to see, from the context, when ‘“‘e” stands for the relation and when for the 
predicate. 
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Now 
xo €€” [A] > x ee [ec | [Al], by V.2.4 

Hence 

xo ey ee! [A] for some y 
Therefore 

xo € y © B, by («) 

and 

xo € B, since B is transitive. 


We have contradicted the choice of ig and xo, and this settles (1). 


(2) follows trivially from (1). 
(3): For any class T, 


e(T]={«:@yeTxey}=|JT 


Thus, by induction on i, one can easily prove that €' [A] is a set (having defined 
€° [A] to mean A for convenience), since 


etl [A]= E [e! 


Then, by collection (e.g., III.11.28), 


[Al =()J € [4] 


S ={A, € [A], e? [A],..., €! [A], ..} 


is a set, and (by union) so is TC(A) = US. 


From the above we infer that another way to “construct” TC(A) is to throw in 
all the elements of A, then all the elements of all the elements of A, then all 
the elements of all the elements of all the elements of A, and so on. That is, 


TC(A) = AUJAUJWAUUUUA... 


V.2.28 Remark. (1) It follows from Lemma V.2.24 (if that were not already 
clear from the definition) that the s- and t-closures are only dependent on the 


relation we are closing, and not on any other context. On the contrary, the 
reflexive closure depends on a context A. 

(2) We also note that closing a relation P amounts, intuitively, to adding 
pairs (x, y) to P until the first time it acquires the desired property (reflexivity 
on some A, or symmetry or transitivity). Correspondingly, P is reflexive on A, 


© 
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or symmetric or transitive, iff it equals its corresponding closure. This readily 
follows from the C-minimality of closures. 


V.2.29 Example (Informal). If A = {1,2,...,”},n > 1, then A x A has Qn 
subsets, that is, there are 2”" relations P : A — A. Fix attention on one such 
relation, say, R. 

Clearly then, the sequence R, R?, R?,..., R',... has at most 2” distinct 
terms, thus 


gr 


Rt= U R' 
i=1 


With some extra work one can show that 


Rt= u R! 
i=1 


in this case. Moreover, if R is reflexive (on A, that is), then 


Rt = R"! 


V.2.30 Example. Let the “higher order collection” of relations (Tz )acqz be given 
by a formula of set theory, .7 (a, x, y), in the sense that 


xTay abbreviates TF (a, x,y) 


so that J 
relation. 
Then the following two (abbreviations of ) formulas are provable in ZFC: 


So (U | =|JGoT.) (1) 


T, stands for {(x, y) : Ga € I).7(a, x, y)}. Let S be another 


ael 


ael ael 
Us] oS=(Jl oS) (2) 
ael ael 


We prove (1), leaving (2) as an exercise. Let 


xSo (Us) y. 


ael 


Then 


(4e)(x SzAz (U | y) 
ael 
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Hence (via some trivial logical manipulations) 


(da € DGz(x SzAz Ta y) 


which yields 
(da € D(x So Ty y) 
and finally 
x{JGoTa)y 
ael 


This settles C; the D-part is similar. 


V.2.31 Example. Consider now P : A > A. We will write A for Ag. We will 
show that 


zee (Wm)((A UP)" = | JP’) 3) 
i=0 


We do induction on m. For m = 0, (3) requires Fzpc A = A, a logical fact. 
I.H.: Assume (3) for some fixed m > 0. 
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Case m + | (employing conjunctionally): 


(A UP)y"*! = (U 


=(J(Po(AUP) (by V.2.30, case (2)) 


i= 


*) o(A UP) (by LH.) 


= Ue wie?) (by V.2.30, case (1), and V.2.11) 


i= 


As an application of this result, we look into tr(P), or more correctly, t(r(P)): 
the transitive closure of the reflexive closure of P. We “calculate” as follows: 


tr(P) = (A UP) = Ua upy = OUP = Up 
i=l i=0 


i=1 k=0 
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Thus 


tr(P) = U Pp (4) 
i=0 


Next, look into rt(P) (really r(t( P)): the reflexive closure of the transitive 
closure of P). Clearly, 


rt(P)= AUP) =|JP (5) 
i=0 


By (4) and (5), zee rt(P) = tr(P). We call this relation (rt(P) or tr(P)) 
the reflexive-transitive closure of P. It usually goes under the symbol P*. 
Thus, intuitively, x P* y iff either x = y, or for some z,..., Zx-1 for k>1, 
xPz,Pz...z-1Py.i 


V.2.32 Informal Definition (Adjacency Map). Given a relation P : A > B, 
its adjacency map Mp is 


{((x, y), i) x Ee AAy Ai€ {0, JAG=1< (x,y) €P)} 


In applications the most interesting case of adjacency maps occurs when 
A=B= {ao,...,d,-1}, a finite set of n elements (we have, intuitively, n 
elements iff i 4 j > a; #a;).' In this case we have relations P on A, a set; 
therefore any such P is a set too. The adjacency map Mp can be represented, or 
“stored” (in a computer, for example), as a table, Ap, known as an adjacency 
matrix, via the definition 


sonseg, EE 
Ap(i, j) = Mp((aj, a;)) 
that is, Ap isn x n and 


. def fl if (aj,a;) EP 
A = : 
pli, J) | otherwise 


We understand i as the row index and j as the column index. 
+ For k=1 the sequence z1,..., Zk—-1 is empty by convention; thus we just have x P y in this 


case. 
The a; can be thought of as values of a function f with domain n, that is a; = f (i). 
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V.2.33 Example (Informal). Consider R = {(a, b), (b,c)} on A = {a, b, c}, 
wherea 4b#c Ha. Letus arbitrarily rename a, b, cas a1, az, a3 respectively — 
or, equivalently, 1, 2, 3, since the a in a; is clearly cosmetic. Then, 


0 1 0 
Ar=|0 0 1 
00 0 
i 20 
Avr =}0 1 1 
001 
Oa 
Ap=|0 0 1 
00 0 
a2 J 
Aye | 04 
110 
ie 
Aiscry = | 1 1 
it Shy 


V.2.34 Example (Informal). Consider A : S > S, where S = {1,2,...,n}. 
Then 
ae 1 ifi=j 
A _ 
ali, J) {0 otherwise 
That is, Aq has 1’s only on the main diagonal. This observation partly justifies 
the term “diagonal relation”’. 


V.2.35 Example (Informal). Given R : A > A and S: A — A, where A = 
{@1, 42,..., 4} and a; Aa; if iA j. We can form R-!, r(R), s(R), Rt, R*, 
RUS, RoS,So R, etc. What are their adjacency matrices? 
First, let us agree that in the context of adjacency matrices the operation “+” 
on {0, 1} is given by? 
1 ifx+y>1 
ee a {0 otherwise 


+ This “+” is often called Boolean addition, for if 0, 1 are thought of as the values false and true 
respectively, the operation amounts to “Vv”. 
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Now, going from easy to hard: 


(1) Apr-: satisfies Ar-i(i, 7) = Ar(, i) for alli, 7, i-e., in matrix jargon, Apr-1 
is the transpose of Ar. 
(2) Argus = Ar+As, that is, Arus(i, j) = Ar@i, J) + As(i, J) for all i, 7. We 
say that A rus is the Boolean sum of Ar and As. 
(3) In particular, A,r) = Aaur = Aa + Apr; thus we pass from Ag to A;R) 
by making all the diagonal entries equal to 1. 
(4) Ascr) = Arur-! = Ar + Ap-, So that Asay(i, 7) = Ar(i, j) + ArG, i) 
for all i, 7. 
(5) What is Ag.s in terms of Ar, As? We have 
Aros(i, j)=1 iff (aj,aj)e RoS 
iff ajRo Sa; 
iff (Am)(aj Ran A an Sa;) 
iff (Gm)((am, aj) € R A (aj, Gm) € S) 
iff (dm)(AsG, m) = 1A Ar(m, j) = 1) 
iff S° As(i,m)- Ag(n, j) = 1 


m=1 


iff (As- Ar), jf) =1 


Thus, Aros = As - Ag (note the order reversal!). 

(6) In particular, A gm+1 = Armor = Ar: Arm. If now we assume inductively 
that A gm =(Ar)” (whichis clearly true! form =0), then Agn+1 =(Ap)"*!. 
Thus Arn = (Ar)” is true for all m > 0. 

(7) It follows from (2), (6), and Exercise V.23 that 


Ars = ) (Ar)! 
i=l 
whereas 


Are = (Ar)! 
i=0 
From the observation that R* = tr(R) and from Exercise V.24 one gets 
Ar: = (Aa + Ar)" | 
a simpler formula. 


The reader will be asked to pursue this a bit more in the Exercises section, where 
“good” algorithms for the computation of Ag+ and A r« will be sought. © 


1 We did say that this example is in the informal domain. 
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V.3. Algebra of Functions 


V.3.1 Definition. Given functions f : A— 


Ax. We say that g is a left inverse of f, and f is a right inverse of g. 


and g: 


— Asuch that go f= 


hon Remark. We will follow in this section the convention of using f, g,h 
and possibly other lowercase letters, with or without subscripts or primes, as 
function names even in those cases that the functions might be proper classes. 
We will continue utilizing uppercase letters for “general” relations. Any startling 


deviations from this notational rule will be noted. 
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<< Example (Informal). Let A = {a, b}, wherea 4 b, and B = {1, 2, 3, 4}. 


Consider the following functions: 


fi = {(a, 1), (b, 3)} 
fr = {(a, 1), (b, 4} 
gi = {(1, a), (3, b), (4, 5) 
82 = {(1, a), (2, b), (3, d) 
g3 = {(1, a), (2, b), (3, d) 
84 = {(1, a), (2, a), (3, b) 
85 = {(1, a), (3, b)} 


We observe that 


soft = 90 fi=g30 fi=gso fi=gso fi=giofrp=g0 fo 


= 940 fo=Ayg 


What emerges is: 


(1) The equation x o f = Ay does not necessarily have unique x-solutions, 


not even when only total solutions are sought. 
(2) The equation x o f = Ay can have nontotal x-solutions. Neither a total 


nor a nontotal solution is necessarily 1-1. 


(3) An x-solution to x o f = Ay can be 1-1 without being total. 
(4) The equation g o x = Ay does not necessarily have unique x-solutions. 


Solutions do not have to be onto. 
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In the previous example we saw what we cannot infer about f and g from 


go f = Ax. Let us next see what we can infer. 


V.3.4 Proposition. Given f : A > 
Then 


(1) f is total and 1-1. 
(2) g is onto. 


and g: 


— A such that go f = Ag. 


V3. Algebra of Functions 273 


Proof. (1): Since g o f is total, it follows that f is too (for f(a) + implies 
g(f(a)) t). Next, let f(a) = f(b). Then g(f(a)) = g(f(b)) by Leibniz axiom; 
hence go f(a) = go f(b), that is, Ag(a) = Aa(d). 


Hence a = b. 


(2): For ontoness of g we argue that there exists an x-solution of the equation 


g(x) = a for anya € A. Indeed, x = f(a) is a solution. 


V.3.5 Corollary. Not all functions f : A > B have left (or right) inverses. 


Proof. Not all functions f : A > B are 1-1 (respectively, onto). 


V.3.6 Corollary. Functions with neither left nor right inverses exist. 


Proof. Any f:A—> 


take f = {(1, 2), (2, 2)} from {1, 2} to {1, 2}. 


which is neither 1-1 nor onto fills the bill. For example, 


The above proofs can be thought of as argot versions of formal proofs, since 1 
and 2 can be thought of as members of (the formal) w. 


V.3.7 Proposition. /f f : A > B is a 1-1 correspondence (cf. TII.11.24), then 
xo f = Ag and f ox = Ag have the unique common solution f—!. 


N.B. This unique common solution, f~!, is called the inverse of f. 


Proof. First off, it is trivial that f~! is single-valued and hence a function. 
Verify next that it is a common solution: 


afof—'b iff (Ac\afe A cf~'b) 


iff (dce)(afec A bfc) 
iff a=b (f is single-valued) 


where the if part of the last iff is due to ontoness of f, while the only-if 
part employs proof by auxiliary constant (let c work, i.e., afc Abfc...). Thus 
x = f~! solves fox = Ag. Similarly, one can show that it solves xo f = Aq 


too. 


Uniqueness of solution: Let xo f = Ag. Then(xo f)o f-! = Ago fl = 
f—!. By associativity of 0, this says xo(fo f7!) = f7| ie, x = xo Ag = 
f—!. Therefore a left inverse has to be f~!. The same can be similarly shown 


for the right inverse. 
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V.3.8 Corollary. /f f : A > B has both left and right inverses, then it is a 1-1 
correspondence, and hence the two inverses equal f~'. 


Proof. From ho f = Ag (h is some left inverse) it follows that f is 1-1 and 
total. From f o g = Ag (g is some right inverse) it follows that f is onto. 


V.3.9 Theorem (Algebraic Characterization of 1-1ness and Ontoness). 


(1) f : A > Bis total and 1-1 iff it is left-invertible.t 
(2) g: B => Ais onto iff it is right-invertible.* 


Proof. (1): The if part is Proposition V.3.4(1). As for the only-if part, note that 
f—!:B — Ais single-valued (f is 1-1) and verify that f~!o f = Ay. 

(2): The if part is Proposition V.3.4(2). 

Only-if part: By ontoness of g, all the sets in the family Cae) eae are 
nonempty. By AC, leth € [],<4 87 | (x). 

Thus, h : A > B is total, and h(x) € go! (x) for all x € A. Now, for all x, 


h(x) é€ gia) iff h(x)g7!x 
iff xgh(x) 
iff x =goh(x) 


That is, goh= Ay. 


V.3.10 Remark. If B C N, then AC is unnecessary in the proof of Theo- 
rem V.3.9 (case (2), only-if part). 

Note that Theorem V.3.9 provides an “equational” or “algebraic” definition 
of ontoness and 1-1-ness (the latter for total functions). 


The reader has probably observed that, given f : A > Band f-!:B— A, 
an easy way to figure out the correct subscript of A in f o f~! = A and 
f-!o f = J is to draw a diagram such as 


+ That is, it has a left inverse. 
t That is, it has a right inverse. 
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This is a trivial example of the usefulness of function diagrams. In some 
branches of mathematics, such as category theory and algebraic topology, 
it is an entire science to know how to manipulate complex function diagrams. 


V.3.11 Informal Definition (Function Diagrams). A function diagram con- 
sists of a finite set of points, each labeled by some class (repetition of labels 
is allowed), and a finite set of arrows between points, each labeled by some 
function (no repetitions allowed). 

If an arrow labeled f starts (i.e., has its tail) at the (point labeled by the) 
class X and ends (i.e., has its head) at the class Y, then the interpretation is that 
we have a function f : X — Y. 


A chain from point (labeled) A to point B in a diagram is a sequence of 
arrows in the diagram such that the first starts at A, the last ends at B, and the 
(i + 1)st starts where the ith ends, for all relevant i-values. 


If fi, fo,..., fy are the labels of a chain in that order from beginning to end, 
then we say that the chain has length n and result fro fn-10-++ 0 foo fi. 


A diagram is called commutative iff any two chains with common start and 
common end, of which at least one has length > 2, have the same result. 


V.3.12 Example (Informal: Examples of Function Diagrams). 


(1) The following is commutative iff go f =ho f. Note that commutativity 
does not require g = h, since both chains from B to C are of length one and 
thus the commutativity concept does not apply: 


& 
ASB3C 
h 


(2) The following is commutative: 


(3) Recall that zr, 5 denote the first and second projections 2((x, y)) = x and 
5((x, y)) = y for all x, y. Let f : C — A and g: C > B be two total 
functions. Then there is a unique total function 4 which can label the dotted 
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arrow below and make the diagram commutative: 


A 
A 
- 
T 
h 
Oe > AxB 
) 
& 
v 
B 


This h is, of course, Ax.( f(x), g(x)): C> Ax B. 


Note that in drawing diagrams we do not draw the points; we only draw their 
labels. 


V.4. Equivalence Relations 


V.4.1 Informal Definition. A relation P : A — A is an equivalence relation 
on A iff it is reflexive, symmetric, and transitive. 


Thus, in the metatheory “P : A — A is an equivalence relation” means that 
“P Cc A x A and P is reflexive, symmetric, and transitive” is true, whereas in 
ZFC it means that the quoted (quasi) translation’ is provable (or has been taken 


as an assumption) © 


V.4.2 Example. As examples of equivalence relations we mention “=”, i.e., 
A on any class, and = (mod m) on Z (see Example V.2.17). 


An equivalence relation on A has the effect, intuitively, of grouping equiva- 
lent (i.e., related) elements into equivalence classes. 


¥ The reflexive, symmetric, and transitive properties have trivial translations in the formal language. 
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Why is this intuition not valid for arbitrary relations? Well, for one thing, not 
all relations are symmetric, so if element a of A started up a club of “pals” 
with respect to a (non-symmetric) relation P, then a would welcome b into the 
club as soon as a PD holds. Now since, conceivably, b P a is false, b would not 
welcome a in his club. The two clubs would be different. Now that is contrary 
to the intuitive meaning of “equivalence” according to which we would like a 
and b to be in the same club. 

O.K., so let us throw in symmetry. Do symmetric relations group related 
elements in a way we could intuitively call “equivalence”? Take the symmetric 
relation #. If it behaved like equivalence, then a 4 b and b # c would require 
all three a, b, c to belong to the same “pals’ club’, for a and b are in the same 
club, and b and c are in the same club. Alas, it is conceivable thata 4 b # c, 
yet a = c, so that a and c would not be in the same club. The problem is that 
+ is not transitive. 

What do we need reflexivity for? Well, without it we would have “stray” 
elements (of A) which belong to no clubs at all, and this is undesirable intuitively. 
For example, R = {(1, 2), (2, 1), (1, 1), (2, 2)} is symmetric and transitive on 
A = {1, 2,3}. We have exactly one club, {1, 2}, and 3 belongs to no club. We 
fix this by adding (3, 3) to R, so that 3 belongs to the club {3}. 

As we already said, intuitively we view related elements of an equivalence 
relation as indistinguishable. We collect them in so-called equivalence classes 
(the “clubs”’) which are therefore viewed intuitively as a kind of “fat urelements” 
(their individual members lose their “individuality”’). 


Here are the technicalities. 


V.4.3 Informal Definition. Given an equivalence relation P on A. The equiv- 
alence class of an element a € A is {x € A: x Pa}. We use the symbol [a]p, 
or just [a] if P is understood, for the equivalence class. 

If P, A are sets P, A, then A/P, the quotient set of A with respect to P, is 
the set of all equivalence classes [a] p. 


(1) Restricting the definition of A/P to sets P, A ensures that [x]p are sets 
(why?) so that A/ P makes sense as a class. Indeed, it is a set, by collection. 
(2) Of course, [a]p = P[{a}] = P(a). 


V.4.4 Lemma. Let P be an equivalence relation on A. Then [x] = ly] iffx Py. 


Proof. If part. Let z € [x]. Then z Px. Hence z Py by assumption and transi- 
tivity. That is, z € [y], from which [x] C [y]. 
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By swapping letters we have y Px implies [y] C [x]; hence (by symmetry 
of P) our original assumption, namely x Py, implies [y] © [x]. All in all, 
[x] = [y]. 

Only-if part. By reflexivity, x € [x]. The assumption then yields x € [y], 
ie. x Py. 


V.4.5 Lemma. Let P be an equivalence relation on A. Then 


(i) [x] AG forallx € A. 
(ii) [x] N Ly] ¥ @ implies [x] = [y] for all x, y in A. 
Git) Useal*] = A. 


Proof. (i): From x P x for all x € A we get x € [x]. 
(ii): Let z € [x] N [y]. Then zPx and zP y; therefore x Pz and zP y (by 

symmetry); hence x P y (by transitivity). Thus, [x] = Ly] by Lemma V.4.4. 
(iii): The C-part is obvious from [x] CA. The D-part follows from 

U,ca{x} = A and {x} C [x]. 


The properties ()—(iii) are characteristic of the notion of a partition of a set. 


V.4.6 Definition (Partitions). Let (F,)a<; be a family of subsets of A. It is a 
partition of A iff all of the following hold: 


(i) Fy 4 @foralla el. 
(ii) Fa Fy 4 O implies F, = F; for alla, bin I. 
it) |g Fa 


There is a natural affinity between equivalence relations and partitions on a 
set A. 


V.4.7 Theorem. The relation C = {(R, A/R) : R is an equivalence relation 
on A} is a 1-1 correspondence between the set & of all equivalence relations 
on A and the set F of all partitions on A. 


our Are all the “sets” mentioned in the theorem indeed sets? 


Proof. By definition, C is single-valued (on the 6-coordinate) on & and total, 
since whenever R occurs in (R, x), x is always the same, namely, A/R. More- 
over, for each R € 4, A/R exists. By Lemma V.4.5 ran(C) C Y, so that we 
have a total function C : & > FY so far. 
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We next check ontoness: 


Let = (F,)ger € Z. Define a relation TI on A as follows: 
xfly iff GaeD{x,y} CR, 
Observe that: 


(i) IT is reflexive: Take any x € A. By V.4.6(iii), there is an a € I such that 
x € F,, and hence {x, x} C F,. Thus x IIx. 
(ii) fi is, trivially, symmetric. 
(iii) TI is transitive: Indeed, let x Tl y Fiz. Then {x, y} C F, and {y, z} C F, 
for some a, bin J. Thus, y € Fy M Fy; hence F, = Fy, by V.4.6(1i). Hence 
{x, z} © Fy; therefore x Tz. 


So [is an equivalence relation. Once we show that cf) = I,ie., A/TI = 
II, we will have settled ontoness. 

C: Let [x] be arbitrary in A/ ni (we use [x] for [x]@). Take F, such that 
x € F, (it exists by V.4.6(iii)). Now let z € [x]. Then z n x; hence z, x are in 
the same F,, which is F, by V.4.6(ii). Hence z € F,; therefore [x] C Fy. 

Conversely, if z € Fz, since also x € Fz, then z iv x; thus z € [x]. All in all, 
[x] = Fy, where F, is the unique F, containing x. Thus [x] € I. 


>: Let F, be arbitrary in I]. By V.4.6(i), there is some x € F,. By the same 
argument as in the C-part, [x] = F,; thus F, € A/TI. 


For 1-1-ness, let I] and TI beas before, and let also R € @suchthat A/R =T1. 
If x Ry, then [x]r = [y]r. Let [x]r = F, for some a e€ /; thus x and y are in 
F,, that is, x I y. The argument is clearly reversible, so R = IT. 


V.4.8 Example (Informal). The equivalence relation =,, on Z determines the 
quotient set Z/ =,= {{i tk-m:keZ:ieZaAd<ic< mh. We usually 
denote Z/ =, by Zm. 


V.4.9 Example. Given f : A > B. Define R, by 


xRyy iff fa@)~ fO) (1) 
where ~ is the weak equality of Kleene (see III.11.17). 
Rf is an equivalence relation: 
(i) Forall x ¢ A we have f(x) ~ f(x); hence x R¢ x (this would have failed 


whenever f(x) if we had used = rather than ~ in (1)). 
(ii) Ry is trivially symmetric. 
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(iii) Ry istransitive: x Ry y R¢ zmeans f(x)~ f(y) f(z), andhence f(x) ~ 
f(z), and hence x Rf z. 

We can, intuitively, “lump” or “identify” all “points” in A that map into the 
same element of B, thus, in essence, turning f into a 1-1 function. We also 
lump together all points of A for which /f is undefined. All this is captured by 
the following commutative diagram: 


h=hx.[x] 
g =x]. f(x) 


AIR, 


We observe: 


(a) Ax.[x] is total and onto. It is called the natural projection of A onto A/ Ry. 

(b) [x] > f(x) is single-valued,! for if [x] = [y], then x Ry y and thus 
fx) TAF(Y) F VED = fx) AZ = f(y). 

(c) The function [x] + f(x) is defined iff f(x) | (trivial). 

(d) [x] # f(x) is 1-1. For, let ([x], a) and ([y], a) be pairs of this func- 
tion. The first pair implies f(x) =a, and the second implies f(y) = a, thus 
f(x) = f(y), and hence f(x) ~ f(y). It follows that x Ry y, and hence 


[x] = Ly]. 
(e) Leth = Ax.[x] and g = A[x]. f(x). Then g o h(x) = g(h(x)) = g([x)) = 
f(x). 


This verifies the earlier claim that the above diagram is commutative. 


+ The term applies to the general case Ax.[x] : A > A/R, not just for the special R = R f above. 
= In this context one often says “well-defined”, i.e., the image f(x) is independent of the repre- 
sentative x which denotes (defines) the equivalence class [x]. 
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(f) The moral, in words, is: “Every function f : A — B can be decomposed 


into a J-/ and an onto total function, in that left-to-right order. Moreover, 


the 1-1 component is total iff f is.” 


V1. 


V.2. 


V.3. 


V4. 


V5. 


V.6. 


V.7. 


V.8. 


V9. 
V.10. 


V.5. Exercises 


Course-of-values induction. Prove that for any formula. (x), 
ZFC + (Vn € w)((Vm <n € @).F%(m) > F(n)) > (Vn € @).F(n) 


or, in words, if for the arbitrary n € w we can prove .¥ (n) on the induc- 
tion hypothesis that .Y (m) holds for all m <n, then this is as good as 
having proved (Vn € w).¥ (n). 


(Hint. Assume (Vn € w) ((¥n <n€a)F(m)> F (n)) to prove (Vn € 
w).¥ (n). Consider the formula “(n) defined as (Vm < n € w).¥(m), 
and apply (ordinary) induction on n to prove that (Vn € w).¥(n). Take 
it from there.) 

The “least” number principle over w. Prove that every J # A C w has 
a minimal element, i.e.,ann € A such that fornom é€ A is it possible to 
have m <n. Do so without foundation, using instead course-of-values 
induction. 

Prove that the principle of induction over w and the least number principle 
are equivalent, i.e., one implies the other. Again, do so without using 
foundation. 

Redo the proof of Theorem V.1.21 (existence part) so that it would go 
through even if trichotomy of € over w did not hold. 

Prove that a set x is a natural number iff it satisfies (1) and (2) below: 
(1) It and all its members are transitive. 

(2) It and all its members are successors or 0. 

Prove that for all m,n,iin@w,m+(n+i)=(m+n)+i. 

Redo the proof of V.1.24 (commutativity of natural number addition) by 
a single induction, relying on the associativity of addition. 

Prove that for all m in w, m <n implies m+ 1 <n (recall that < on w is 
the same as C). 


Prove that for all m,n inw,m<nimpliesm+1<n-+1. 


Show by an appropriate example that if f, g are finite sequences, then 
f*g#g% f in general. 
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V.11. 


V.12. 
V.13. 


V.14. 


V.15. 
V.16. 


V.17. 
V.18. 


V.19. 


V.20. 
V.21. 
V.22. 
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“eo 


Define multiplication, “-”, on w by 


m-0=0 
m:-(n+1l)=m-n+m 
Prove: 
(1) - is associative. 


(2) - is commutative. 
(3) - distributes over +, i.e., for all m,n, k,(m-+n)-k = (m-k)+(n-k). 


Prove that m +n < (m+ 1)-(n+ 1) forall m,n ino. 


Let P be both symmetric and antisymmetric. Show that P C Aa, where 
A is the field of P. Conclude that P is transitive. 


In view of the previous problem, explore the patterns of independence be- 
tween reflexivity, symmetry, antisymmetry, transitivity, and irreflexivity. 


For any relation P, (P~!)~! = P. 
Let R: A > A be arelation (set). Define 


P={SCAxA:RCSA Sis reflexive} 
Q={SCAxA:RCSA Sis symmetric} 
T={SCAxA:RCSA Sis transitive} 


Show that 
r(R)=(\P 
s(R)=()@ 


t(R) =(\T 


Fill in the missing details of Example V.2.29. 
If Rison A = {1,..., 7}, then show that 


Rt = u R' 
i=1 


If R is reflexive on A = {1,..., n}, then 
Rt = R" 
Show that for any P: A > A, s(r(P)) = r(s(P)). 


Show by an appropriate example that, in general, s(t(P)) 4 t(s(P)). 
Given R on A = {1,..., 7}. 
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V.24. 


V.25. 
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(a) Prove by an appropriate induction that the following algorithm ter- 
minates with the value A+ in the matrix variable M: 


M< Ap 
for i= 1 ton—1do 
M <—(Aa+M)- Ar 


(b) Show that the above algorithm performs O(n’) operations of the 
type “47? and a 

f(a) = O(g(n)) means | f(n)| < C - |g(7)| for all n => no (for some con- 

stants C, no independent of 7), or, equivalently, | f(7)| <C-|g(m)|+ D 

for all n > 0 (for some constants C, D independent of 7). 


(a) Prove that if R ison A = {1,...,m}, then Ré'= (JJ, R’ for all 
m>n. 

(b) Based on the above observation, on Example V.2.35, and on the fact 
(with proof, of course) that for any matrix M we can find M "ink 
matrix multiplications, find an algorithm that provably computes Rt 
in O(n? log n) operations of the type “+” and “.”. 

Prove by appropriate inductions that the following algorithm due to 

Warshall terminates with the value Ar+ in the matrix variable MW, and 

that all this is done in O(n*) “+”-operations (there are no “.”-operations 

in this algorithm): 
M<Ar 
for j = 1 to ndo 
for i = 1 ton do 
if M(i, 7) = 1 then 
for k = 1 ton do 
M(i,k) — MG, kK) + MG, 4) 
fi 


Is the algorithm still correct if the first two loops are interchanged (i.e., 
if i is controlled by the outermost loop, rather than j)? 

(Hint. When M(i, j) = 1, Mi, k) — MC, k) + M(j,k) says the same 
thing as M(i,k) — M(i,k) + Mi, j)- MG, k).) 

Prove that for sets P, A, where P is an equivalence relation on A, A/P 
is a set as Definition V.4.3 wants us believe. 


VI 


Order 


This chapter contains concepts that are fundamental for the further development 
of set theory, such as well-orderings and ordinals. The latter constitute the 
skeleton of set theory, as they formalize the intuitive concept of “stages” and, 
among other things, enable us to make transfinite constructions formally (such 
as the construction of the universe of sets and atoms, Uy, and the constructible 
universe, Ly). 


VI.1. PO Classes, LO Classes, and WO Classes 
We start with the introduction of the most important type of binary relation, 
that of partial order. 
VI.1.1 Definition. A relation P is a partial order, or just order, iff it is 


(1) irreflexive (i.e., x Py + —x = y) and 
(2) transitive. 


It is emphasized that P need not be a set. 


VI.1.2 Remark. 


(1) The symbol < will be used to denote any unspecified order P, and it will be 
pronounced “less than’’. It is hoped that the context will not allow confusion 
with the concrete < on numbers (say, on the reals). 

(2) If the field of the order < is a subclass of A, then we say that < is an order 
on A. 

(3) Clearly, for any order < and any class B, < N (B x B)-or < |B -isan 
order on 
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VI.1.3 Example (Informal). The concrete “less than”, <, on N is an order, but 
< is not (it is not irreflexive). The “greater than” relation, >, on N is also an 
order, but > is not. 


In general, it is trivial to verify that P is an order iff P~! is an order. 


VI.1.4 Example. 9 is an order. Since for any A we have § C A x A, @ is an 
order on A for the arbitrary A. 


VI.1.5 Example. The relation € (strictly speaking, the relation defined by the 
formula x € y — see III.11.2) is irreflexive by the foundation axiom. It is not 
transitive, though. For example, if a is a set (or atom), then a € {a} € {{a}} but 
a ¢ {{a}}. 

Let A = {G, {0}, {O, {O}}, {O, {0}, {O, {O}}}}. The relation e =e M(A x A) 
is transitive and irreflexive; hence it is an order (on A). 


VI.1.6 Example. C is an order; C — failing irreflexivity — is not. 


VI.1.7 Definition. Let < be a partial order on A. We use the abbreviation < for 
ra(<) = AgU <. We pronounce < “less than or equal”. ra(>), ie., ra(<7!) 
is denoted by > and is pronounced “greater than or equal”. 


(1) In plain English, given < on A, we define x < y tomeanx <yVx=y 
for all x, yin A. 

(2) The definition of < depends on A, due to the presence of Aq. There is no 
such dependence on a “reference” or “carrier” class in the case of <. 


VI.1.8 Lemma. For any <: A — A, the associated < on A is reflexive, 
antisymmetric, and transitive. 


Proof. (1) Reflexivity is trivial. 

(2) For antisymmetry, let x < y and y <x. If x =y then we are done, so 
assume the remaining case x #y (ie., (x, y) ¢ Aa). Then the hypothesis 
becomes x < y and y <x; therefore x < x by transitivity, contradicting the 
irreflexivity of <. 

(3) As for transitivity, let x < y and y < z.Ifx = z, then x < z (see the 
remark following VI.1.7) and we are done. The remaining case is x 4 z. Now, 
if x = y or y = z, then we are done again; so it remains to consider the case 
x < yand y < z. By transitivity of < we get x < z, and hence x < z, since 
<CK<. 
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VI.1.9 Lemma. Let P on A be reflexive, antisymmetric, and transitive. Then 
P — Ag is an order on A. 


Proof. Since 
P—AaCcP Gd) 


it is clear that P — A, is on A. It is also clear that it is irreflexive. We only need 
verify that it is transitive. 


So let 
(x, y) and (y, z) bein P— Ag (2) 
By (1) 
(x, y) and (y, z) are in P (3) 
Hence 
(x,z) EP 


by transitivity of P. 
Can (x, z) € Aq, ie., can x = z? No, for antisymmetry of P and (3) would 

imply x = y, Le., (x, y) € Ag, contrary to (2). 
So (x, z) € P— Ag. 


VI.1.10 Remark. Often in the literature <: A — A is defined as a partial order 
by the requirements that it be reflexive, antisymmetric, and transitive. Then < is 
obtained as in Lemma VI.1.9, namely, as < — Ag. Lemmata VI.1.8 and VI.1.9 
show that the two approaches are interchangeable, but the modern approach 
of Definition VI.1.1 avoids the nuisance of tying the notion of order to some 
particular carrier class A. For us “<” is the derived notion from VI.1.7. 


VI.1.11 Informal Definition. If < is an order on a class A, we call the pair‘ 
(A, <) a partially ordered class, or PO class. If < is an order on a set A, then 
we call the pair (A, <) a partially ordered set or PO set. Often, if the order < 
is understood as being on A or A, one says that “A is a PO class” or “A is a PO 
set” respectively. 


VI.1.12 Example (Informal). Consider the order C once more. In this case 
we have none of {4} Cc {{A}}, {{G}} Cc {GO} or {{A}} = {G}. That is, {A} and {{}} 


i Formally, (A, <) is not an ordered pair (...), for A may be a proper class. We may think then 
of “(A, <)” as informal notation that simply “ties” A and < together. If we were absolutely 
determined to, then we could introduce pairing with proper classes as components, for example 
as (A, B) = (A x {0}) U (B x {1}). For our part we will have no use for such pair types and will 
consider (A, <) in the informal sense. 
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are non-comparable items. This justifies the qualification partial for orders in 
general (Definition VI.1.1). 


On the other hand, the “natural” < on N is such that one of x = y, x < y, 
y < x always holds for any x, y (trichotomy). That is, all (unordered) pairs 
x, y of N are comparable under <. This is a concrete example of a total order. 
Another example is € on  (V.1.20). 


While all orders are partial orders, some are total (< above) and others are 
nontotal (C above). © 


VI1.1.13 Definition. A relation < on A is a fotal or linear order on A iff 


(1) it is an order, and 
(2) for any x, yin A one of x = y, x < y, y < x holds (trichotomy). 


If A is a class, then the pair (A, <) is a linearly ordered class, or LO class. 
If A is a set, then the pair (A, <) is a linearly ordered set, or LO set. One often 
calls just A a LO class or LO set (as the case warrants) when < is understood 
from the context. 


VI.1.14 Example (Informal). The standard <: N > N is a total order; hence 
(N, <) is aLO set. 


VI.1.15 Definition. Let < be an order and A some class. An element a € A is 
a <-minimal element in A, or a <-minimal element of A, iff ~@x € A)x <a. 


m € Aisa <-minimum element in A iff (Wx € A)m < x. 


We also use the terminology minimal or minimum with respect to <, instead 
of <-minimal or <-minimum. 


Ifa € Ais >-minimal in A, that is, (4x € A)x > a, we call aa <-maximal 
element in A. Similarly, a >-minimum element is called a <-maximum element. 


If the order < is understood, then the qualification “<-” is omitted. 


In particular, if a € A is not in the field of <, then a is both <-minimal and 
<-maximal in A. 


Note that minimality with respect to < in A has the interesting formulation 
< (a)N A = §, which, if < is on A, simplifies further to < (a) +. In this 
light, the “general case” also reads (< | A) (a) + (see III.11.9), ie., a € A is 
<-minimal iff the (relational) restriction of < on A is undefined at a. 


Because of the duality between the notions of minimal/maximal and 
minimum/maximum, we will mostly deal with the <-notions, whose results 
can be trivially translated for the >-notions. © 
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VI.1.16 Example (Informal). 0 is minimal, and also minimum, in N with res- 
pect to the natural ordering. 


In P(N), @ is both C-minimal and C-minimum. On the other hand, all of 
{0}, {1}, {2} are C-minimal in P(N) — {@} but none are C-minimum in that set. 


Observe from this last example that minimal elements in a class are not in 
general unique. 


VI.1.17 Lemma. Given an order < and a class A. 


(1) [fm is a minimum in A, then it is also minimal. 
(2) Ifm is a minimum in A, then it is unique. 


Proof. (1): Assume 


(Vx € A\(m =x Vm <x) (i) 


and prove —(Ax € A)x < m. 

Well, assume (Ax € A)x < m instead, and introduce a new constant a with 
the assumptiona <mAaeéA.By(i),m=avm <a. Now, by irreflexivity, 
case m = a is ruled out. But then case m < a and transitivity yield a < a, 
which contradicts irreflexivity. 


(2): Let m and n be minima‘ in A. Then m < n (with m posing as mini- 
mum) and n < m (now n is so posing); hence m = n by antisymmetry 
(Lemma VI.1.8). 


VI.1.18 Example. Let m be <-minimal in A. Let us attempt to show that 
it is also <-minimum (this is, of course, doomed to fail due to VI.1.16 and 
VI.1.17(2) — but the false proof below is interesting). 

By VIL.1.15 we have —(Ax € A)x < m. That is, (Vx € A)->x < m, Le., 
(Vx € A)m < x, which says that m is <-minimum in A. 


The error is in the last step, where ~x < m andm =x Vm < x were 
taken to be equivalent — i.e., we unjustifiably assumed trichotomy or totalness 
of the order “<”. As we have seen (VI.1.16), it is possible to prove all three of 

xX <m,7xX = m,—m < x for some orders and appropriate x and m. 


VI.1.19 Lemma. /f < is a linear order on A, then every minimal element is 
also minimum. 


+ Plural of minimum. 
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Proof. The false proof of the previous example is valid under the present cir- 
cumstances. 


Much is to be gained, especially for work we will be doing in the next 
section, if we generalize the notion of “minimal” — and, dually, “maximal” — 
(Definition VI.1.15) to make it relevant to any relation P, even one that is not 
necessarily an order. 


VI1.1.20 Definition. Let P be some relation and A some class. 
We say that a € A is P-minimal in A — or a P-minimal element of A — iff 
P| A(a) tt 


If a is P~!-minimal in A, then we call it P-maximal in A. 


VI.1.21 Remark. Clearly, Definition VI.1.15 is a special case of VI.1.20 when 
P is an order. Fora € A the condition P | A(a) t is equivalent to AN P(a) = @, 
since 

7) ifagA 


P| A(a) = 
La) | AN P(a) otherwise 


The following type of relation has crucial importance for set theory, and 
mathematics in general. 


VI.1.22 Informal Definition. A relation P (not necessarily an order) satisfies 
the minimal condition (briefly, it has MC) iff every nonempty A has P-minimal 
elements. 

If a total order <: A > A has MC, then it is a well-ordering on (or of) the 
class A. 

If (A, <) is a LO class (or set) with MC, then it is a well-ordered, or WO, 
class (or set). 


VI.1.23 Remark (The Formalities — Exegesis). The above informal definition 
is worded semantically (most informal definitions are), saying what an inhab- 
itant of Uy ought to look for to recognize that a relation P has MC. Since we 
insist on certifying truths via proofs in ZFC (notwithstanding our knowledge 
that not all truths are so certifiable), operationally, we have so certified some 


7 As in the case where we wrote “<” for “P” (VI.1.15), the symbol “|” is taken to have higher 
priority than “(a)” and “4”; thus “P| A(a) ¢” means “(P | A)(a) 5 ae 
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relation P — and proclaimed that “IP has MC” — just in case we have proved the 
schemat 


DAA (ax EA)ANP(x) =B (1) 


Correspondingly, the phrase “let P have MC” is argot for the phrase “add the 
axiom schema (1)”. 

In the present connection, if we set A= {x :.4[x]}, schema (1) translates 
into 


(Ax). 4[x] > Ax)(-4Ex] A -Gy)(y Px A. 4LyD) (2) 


and each specific formula . 4 provides an instance (see also VI.1.25 below). 

The reader will immediately note that (2) generalizes the foundation schema: 
Foundation is just the formal translation of the phrase “the (relation) € — Le., 
{(x, y) : y € x} where the “e” in “{...}” is the nonlogical predicate of Lse — 
has MC”. 

This discussion is also meant to caution that the casualness of Definition 
VI.1.22 does not hide between the lines quantification over a class term (A) — 
a thing we are not allowed to do. 


Clearly, @ has MC. So, every relation can be “cut down” to a point that it 
has MC (if necessary, cut it down all the way to 9). One interesting way to cut 
down a relation is by the process of restriction. 


We say that P has MC over A just in case P | A has MC. 


The term “PO set’ (also “poset’”) is standard. “LO set” is not much in cir- 
culation, but “WO set” has occurred elsewhere (Jech (1978b)). By analogy we 
have introduced the (non-standard) nomenclature PO class, LO class, and WO 
class. 


VI.1.24 Proposition. The condition “P has MC over A” is provably equiva- 
lent tot 


OGABCA— (x € BBNP(x) =9 (1) 


Proof. Using the deduction theorem, we have two directions to prove: 


—: Let us first assume that P has MC over A. This means that P| A has 
MC (by definition, VI.1.23), and therefore our assumption amounts to adding 


+ Here x € A; thus P| A(x) = AM P(x) — see VL1.21. A provable schema is, of course, one such 
that all its instances are (here, ZFC) theorems. 

* Where “ 4 B C A” is short for “6 4 B AB C A”, ie., utilizing the connectives “4” and “C” 
conjunctionally. 
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the schema below (cf. VI.1.23): 


0 ~AB— (ax € BBN (P| A(x)) =f (1a) 


Next, we focus on one unspecified B (so-called “arbitrary”’). Now, after adding 
4 @ and B C A, (1a) yields 


(ax € BBN P(x) =9 


using Remark VI.1.21. This proves (1). 


<: Conversely, assume (add) (1), fix B, and let @ 4 B (it is not assumed 
that B C A). 


We want to prove 


(ax € BBN (P/A(x)) = (2) 


Case BN A = @. Then 


xeBo N(ANP(x)) = 9 


Therefore 


xEeBo BNQ@|A(x)) =@ 


by VI.1.21, which yields (2) by 4-monotonicity (1.4.23) and modus 
ponens. 
Case BNA 4 @. By (1), 


(Ax € BN A\(BNA)NP(x) =B (3) 


since BN A C A. Therefore (2) is deduced again, since (BN A)N P(x) = 
(P| A(x)) and the quantification in (3) can be changed to (Ax € B). 


Thus under both cases, the assumption 4 4 B yields (2), ie., P has MC 
over A. 


VI.1.25 Corollary. That P has MC over A is provably equivalent to the 
schema (4) below: 


(ax € A)F[x] > (ax € A)(F[x] A Gy € A\(yPxA.F[y)) (4) 


Proof. (a) Add schema (1) above, and prove schema (4): Fix .¥, and let B = 
AN {x :.F[x]}. We add 


(ax € A).F [x] (hypothesis of (4)) (5) 


(5) yields 6 4 B C A; hence, by (1), 
(ax € BBN P(x) =6 (6) 
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(6) yields 


(ax € B)-GyoOPxAyeB 


wm 


which in turn yields 
(ax € A)(F[x] A ay € Ay Px A.F Ly) 
This concludes the proof of (4). 


(b) Conversely, assume (4) and prove (1): So let @ 4 B C A for fixed B. The 
class B is “given” by a class term {x : .7[x]}, so that the assumption yields 


(ax € A).A[x] 


By (4) we get 
(Ax € A)( Ax] A -7(dy € A)(y Px A Bly) 


which, in terms of B, reads 


(ax EANBANP(x)/NB=G 


which, in view of B C A, yields exactly what we want: 


(ax € BBN P(x) =6 


VI.1.26 Corollary. [f P has MC over A and B C A, then P has MC over B. 


Proof. By V1.1.24 we add the schema 
BACCA—> Axe OCNCNP(x) =9 (7) 


and fix A and %@ 4 D C BC A. We want 


(ax € DDN P(x) =9 


which we have by (7), since the hypothesis implies 6 4 D C A. 


VI.1.27 Example (Informal). (N, <), where < is the natural order, is a WO 
set. (Z, <) is not. Define next < on N”*! by 


(Xn41) < n41) iff x1 <y AX; = y; fori =2,...,n+1 
where ““<” denotes the natural order on N. Then (N”t!, <) is a PO set (but not 
a LO set) with MC. 
Indeed, < is irreflexive and transitive ((Xn41) < (Yn+1) < (Zn41) Means x; < 
yi <z,andx; = y; = z; fori = 2,...,n+1;hence (X,41) < (Zn41)); therefore 
it is an order. Note that (X,+1) and (¥,41) are non-comparable if x. 4 yo. 


For any 4 4 B C N"*! the minimal elements are (n + 1)-tuples with mini- 


mum first component. 
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VI.2. Induction and Inductive Definitions 


We have already seen the application of induction, both informally (over N) and 
formally (over w), as well as inductive definitions both over N (in definitions — 
cf. Section I.2 for the justification of this metatheoretical tool) and over w. The 
purpose of this section is to study the “induction phenomenon” further, since 
these techniques are commonplace in set theory (and logic in general). We will 
see that N and w do not hold a monopoly on inductive techniques and that 
we can do induction and inductive (or recursive) definitions over much more 
general, indeed “longer’’, sets than the natural numbers. 


VI.2.1 Informal Definition. A relation P (not necessarily an order) satisfies 
the inductiveness condition, or has IC, iff for every class A 


(Wx)(P(x) CA > x €A)> (Vx)x CA (1) 


holds. Formula schema (1) is called the P-induction schema. 


VI.2.2 Remark. As in the case of MC (cf. VI.1.23), operationally, the phrases 
“P has IC” and “let P have IC” are argot, respectively, for “(schema) (1) is 
provable” and “add schema (1) (as an axiom schema)”. 

Once again, we remind the reader that we are not quantifying over A in VI.2.1 
any more than we are quantifying over a formula .¥ in the statement of the 
collection axiom. 


Practically speaking, what the induction schema (1) enables is as follows: If 
we want to prove x € A for the free variable x, and if we know of some relation 
P that has IC,‘ then our task can be helped by the additional hypothesis — known 
as the induction hypothesis, (1.H. for short) — P(x) C A. 


This technique proves (Vx)x € A by P-induction on (the variable) x. 
Of course, what we have outlined above in English is how to prove 
P(x) CA>xeEA 
via the deduction theorem; the usual restrictions on the free variables of P(x) C 
A apply. 
Now, when employed in work within ZFC, A is just an argot name for the 


class term {x : .4[x]}; thus the P-induction schema (or principle), for any P 
that has IC, can be restated without class names as 


(vx)(WILY Px > 4ly)) > 4x1) > (Vx). 4[x] (la) 


+ We “know” because we either proved or assumed schema (1). 
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or, in English (again invoking the deduction theorem with the usual restrictions): 
If P has IC, then to prove (Vx). 4[x] it suffices to prove .4[x] with the help 
of an additional “‘axiom” (induction hypothesis): that (Vy)(y Px > .4[y]) - 
with all the free variables of this “axiom” frozen. 


An elegant way to say the same thing is that “the property .4 propagates 
with P” in the sense that if all the (“‘values” of ) y that are “predecessors” of x — 
i.e., y Px — have the property, then so does x.i 


If P is an order, then (1) will be immediately recognized in form. It gener- 
alizes the well-known principle of course-of-values induction‘ over N. 


One can easily verify that # has IC. As in the case of the MC property, a relation 
can be “cut down” until it acquires IC. In particular, this may come about by 
the process of restriction. 


VI.2.3 Definition. We say that P has IC over A just in case P| A has IC. 


VI.2.4 Proposition. That P has IC over A is provably equivalent to the follow- 
ing schema: 


(Vx € AKAN P(x) © > Xx )—> (Wx € A)x (2) 


Proof. Only-ifpart. Assume that P has IC over A, 1.e., add the following schema: 
(Vx)(P| A(x) CD > x € D) > (Vx)x € D (3) 

To prove (2), fix B and add 
(Vx € AKAN P(x) CB—> x € B) 


that is, 


(W¥x)(x € AV (AN P(x) CB > x € B)) 


or, provably equivalently,’ 


(Vx)(P| A(x) C B> x € BUA) (4) 


— 


Here we are just trying to employ some visually suggestive nomenclature, thus we are forgetting 
the “reality” that y is an “output” of P on “input” x and thus, in the intuition of cause and effect, 
it comes after x. We are simply concentrating on the visual effect: y appears to the left of x in 
the expression y Px. 

This is the name of the induction over N that takes the LH. on 0,..., 2 — 1 — rather than just on 
n — | —in order to help the case for n. We have encountered this in a formal setting in Peano 
arithmetic in volume 1, Chapter II. See also Exercise V.1. 

This is hardly surprising, in view of VI.1.23 and VI.2.11 below. 

1 A=Uy—-A. 


oe 


wo 


© 
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by VI.1.21. Since P| A(x) € 


UA implies P| A(x) € B (by VI.1.21), (4) — 


via tautological implication followed by V-monotonicity (1.4.24) — finally 


yields 


(¥x\(P| A(x) CBUA>xe 


which, by (3), proves (Vx)x € 


lishes (2). 


If part. Assume (2), fix B, and calculate: 


(Wx)(P| A(x) C B > x € B) 


o (tautological equivalence and Leibniz rule 


(¥x)((r € A> P| A(x) C Bo>xeB)A 


(x ¢A > P|A(xs) CBoxe B)) 


UA) 


BU A, that is, (Vx € A)x € B. This estab- 


o (distributing V over A, VI.1.21, and simplifying using Leibniz rule) 


(Vx € A(AN P(x) C 
=> (using 2) 


(Vx)(x E€A>x EB AWx)\a €A> x €B) 


o (distributing V over | 


(Wx)((We A> xe BYA(x€éA>xXE 


o (tautological equivalence and Leibniz rule) 


(Vx)x € 


Thus, the top line implies the bottom line, as we need. 


B> xe B)AWx)a €A> x €B) 


) 


The practical outcome is this: To prove x € A> x € B-e., to prove that 
A C B-one normally applies the deduction theorem, freezing the free variables 
and assuming x € A. The aim is then to prove x € B instead. Now, VI.2.4 
shows that if we know of some relation P that has IC over A, then we can use 
an additional hypothesis (I.H.), namely, 


AN P(x) CB 


Of course, an additional hypothesis usually helps. 
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VI.2.5 Corollary. That P has IC over A is provably equivalent to the following 
schema: 


(vx € A)((Vy € Ay Px > F[y]) > FIx]) > (Wx € AF [x] 


VI.2.6 Remark. In the above corollary (Vy € A)(y Px > .¥ [y]) is, of course, 
the I.H. The following formula is the induction step: 


(Wy €A\YPx > Fly) > FE] (a) 


What happened to our familiar (from “ordinary” induction over N, or w) basis 
step? The answer is that to prove the induction step (a) with x free entails that the 
proof must be valid, in particular, for all the P-minimal elements of A, if any.’ 
Now, when considering the case where x is P-minimal in A, (a) is prov- 
ably equivalent to .Y [x] — which is a! basis case for x: Instead of proving (a), 
prove .F [x]. 
Indeed, that .7 [x] implies (a) is trivial. Conversely, since y € A A y Px is 
refutable for an x that we have assumed to be P-minimal, (Wy)( ye AAyPx-> 
F (yl) is provable, so that, if (a) is, so is.Y [x] by modus ponens. © 


VI.2.7 Example (Informal). The “course-of-values <-induction” over N, as 
it is outlined in the elementary literature (e.g., discrete mathematics texts), says 
that to prove (Wn € N)VA(n) one only need do (1) and (2) below: 


prove /(0) (1) 
and 
prove for every n € N — {0} that (Vm < n)VY(m) implies A(n) (2) 


Stating (1) explicitly is standard folklore, but, as we have already remarked 
in VI.2.6 above, we can actually merge (1) and (2) into 


(vn € N)((Wm <n)A(m) > P(n)) 


VI.2.8 Remark. In practice, Corollary VI.2.5 is often applied in such a way 
that the “verification” on the P-minimal elements of A is stated and performed 
explicitly: 


Basis cases: One proves.¥ [x] on the assumption that x € Ais P-minimal. 
Induction step: One proves .Y [x] on the assumption that x € A is not 
minimal, using as I.H. that (Vy € A)(y Px > .F[y]). 


t Tt turns out that P has MC over A, so that A does have minimal elements — Theorem VI.2.11 
below. 
“A” rather than “the”, since there may be many minimal elements. 
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Of course, the usual precautions that one takes when applying the deduction 
theorem are taken. 


VI.2.9 Definition. For any relation P, an infinite descending P-chain is a func- 
tion f with the properties 

(1) dom(f) = @, and 

(2) Wn €w) fin + I)P fin). 


Intuitively, an infinite descending P-chain is a sequence ao, a,...such 
that ...a3 Pay Pa; Pao. 


VI.2.10 Informal Definition. A relation P is well-founded iff it has no infinite 
descending chains. 


P is well-founded over A iff P | A is well-founded. 


Intuitively, P is well-founded if the universe Uy cannot contain an infinite 
descending chain, while it is well-founded over A if A cannot contain an infinite 
descending chain. Clearly, no infinite descending P-chain can start anywhere 
outside dom(P) in any case. 


There is some disagreement on the term “well-founded”. In some of the 
literature it applies definitionally to what we have called relations “with MC”. 
However, in the presence of AC well-founded relations are precisely those that 
have MC, so the slight confusion — if any — is harmless. © 


VI.2.11 Theorem. For any relation P the following are provable: 


(1) P has MC over a class A iff it has IC over A. 
(2) If P has MC over A, then P is well-founded over A. 


Proof. (1): Consider the schema in VI.1.25 and the schema in VI.2.5. The 
former schema is that of MC over A, while the latter is that of IC over A. It is 
trivial to verify that an instance of any one of the two schemata realized with 
a formula .¥ is provably equivalent to the contrapositive of the instance of the 
other realized with the formula —.7. 


(2): Let instead f be an infinite descending P| A-chain. Then @ 4 ran(f) C 
A, and hence there is an a € ran(f) which is P| A-minimal. Now, a= f(n) 
for some n € a, but f(n + 1)(P| A) f(), contradicting the P| A-minimality 
of a. 


VI.2.12 Corollary. [f P has IC over A and B C A, then P has IC over 


Proof. By V1.1.26. 
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VI.2.13 Corollary. Let A be set. Then the following are provably equivalent: 


(1) P has MC over A. 
(2) P has IC over A. 
(3) P is well-founded over A. 


Proof. We only need to prove that (3) implies (1). So assume (3), and let (1) 
fail. Let @ 4 B C A such that B has no P-minimal elements. Pick ana € B. 
Since it cannot be P-minimal, pick an a; € B such that a; Pa. Since a, cannot 
be P-minimal, pick an a2 € B such that a2 Pay. 

This process can continue ad infinitum to yield an infinite descending 
chain...a3 Pa) Pa, Pa in A, contradicting (3). 

This argument used AC, and more formally it goes like this: Let g be a choice 
function for P(B) — {G}.' Define f on w by recursion as 


_ { a(B) ifn =0 
EOS acer —1)) ifn>0 
f is total on m for BN P(f(n—1)) ¥ @ for all n >0, by assumption (cf. 


VI.1.24). By g(x) € x for all x € P(B) — {0}, we have f(n) € P(f(n — 1), 
ie., f(n)Pf(n — 1) for alln > 0; thus f is an infinite descending chain. 


VI.2.14 Remark. The corollary goes through for any class A, not just a set A, 
as we will establish later. 

It is also noted that a weaker version of AC was used in the proof, the so- 
called axiom of dependent choices, namely that “if P is arelation and B 4 Ja 
set such that (Vx € B)(Ay € B)y Px, then there is a total function f :w— B 
such that (Vn € w) f(n + 1)P f(n).” © 


VI.2.15 Example. If P is well-founded, then it is irreflexive. Indeed, if a Pa 
for some a, then An.a on w is an infinite descending chain (...a Pa PaPa). 

By Theorem VI.2.11, if P has IC (equivalently MC), then it is irreflexive. 

If P is irreflexive but not well-founded, is then P* a partial order? (A legiti- 
mate question, since P* is transitive.) Well, no, for consider R = {(1, 2), (2, 3), 
(3, 1)}, which is irreflexive. Now R* ={(1, 1), (2, 2), (3, 3), (1, 2), (2, 3), 
(3, 1), (1, 3), (2, 1), (3, 2)}, which is not a partial order (it is reflexive), nor 
the reflexive closure of one, since it is not antisymmetric (e.g., 1R3A3R1 
requires | = 3). 


It turns out that if P has MC, then so does P*, and hence, in particular, it is 
a partial order, being irreflexive. 


1 Proof by auxiliary constant, “g”. 
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VI.2.16 Theorem. If P has MC (IC), then so does P*. 


Proof. Let @ # A anda € A be P-minimal, i-e., 

P(a) t (1) 
Suppose now that b P* a for some b. Then, for some f with dom(f) =n € @ 
andn > 2(whyn > 2?), we have f(0) =a, f(n—1) = b, and f(i) P fi — 1) 
fori = 1,...,n — 1 (V.2.8 and V.2.24). In particular, f(1)P f(0), which 
contradicts (1). Therefore a is also P*-minimal. 


VI.2.17 Corollary. If P has MC (IC) over A, then (P| A)t has MC (IC). 


Proof, It is given that P| A has MC (IC). By VI.2.16 (P| A)* has MC (IC). 


We cannot sharpen the above to “IP* has MC (IC) over A”, for that means that 
Pt | Ahas MC. The latter is not true, though: Let O be the odd natural numbers, 
and R be defined on N by x R y iffx =y +1; thus Ré'= >. 

Now, R has MC over O (for R | O = 9), yet R* does not, for R* | O has an 
infinite descending chain in O: 


s+ >7>5>3>1 


In particular, we note from this example that (P| A)t 4 P* | A in general. © 


VI.2.18 Example. Let < on w be defined by n ~ m iffm = n+ 1. It is obvious 
that ~ is well-founded; hence it has MC and IC by VI.2.13. 


What is ~<-induction? For notational convenience let “(Vx)” stand for 
“(Wx € w)”. Thus, for any formula .¥ (x), 
(Wn)((Wx < n).F (x) > F(n)) > (¥n).F(n) 
holds. In other words, if .7(0) is proved [this is provably equivalent to 
(Vx < 0).F (x) > .F (0) — see VI.2.6] and if also.¥ (n — 1) > .¥ (n) is proved 
under the assumption n > 0, then (Vn).F (n) is proved. 
This is just our familiar “simple” (as opposed to “course-of-values’’) induc- 


tion over w, stemming from the fact that w is the smallest inductive set (see 
V.1.5 and V.1.6). 


The “natural” < on w (i.e., €) is <*. <-induction over w coincides with the 


course-of-values induction over w. 


VI.2.19 Example. We already know that the axiom of foundation yields that 
€ has MC. Therefore properties of sets can be proved by €-induction over Uy. 
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VI.2.20 Example (Double Induction over w). (See also Chapter V, p. 248.) 
We often want to prove 


(Vm)(¥n).F (m,n) () 
for some formula .Y and m,n ranging over w. The obvious approach, which 
often works, is to do induction on, say, m only, treating n as a “parameter”. That 
is (assuming the problem can be handled by “simple” induction): 

(i) Prove (Vn).F (0, n). 

(ii) For m => 0 prove (Vn).¥ (m + 1, n) from the LH. (Vn). (m, n). 
Sometimes steps (i) and/or (ii) are not easy, and can be helped by induction on 
n, that is: 

(iii) Prove .¥ (0, 0). 
(iv) For n > 0 prove .¥ (0, n + 1) from the IH. (on n).¥ (0, n), 


which settles (i) by induction on n, and then 


(v) For m > 0 prove .¥ (m + 1, 0), from the IH. of (ii) above. 
(vi) For m > 0,n > 0 prove.Y(m + 1,n + 1) from the assumptions 
(a) LH. onn, namely, .¥ (m + 1, n), and 
(b) I.H. on m ((ii) above). 
Let us revisit the above “cascaded” induction from a different point of view. 
Define < on w x w by 


(a,b)<(c,d) iff c=a+1va=cAd=b+1 


It is clear that < is well-founded; hence it has IC over w”. 


What is the proof of (1) by <-induction? 


(vii) Prove .¥ (0, 0) ((0, 0) is the unique <-minimal element in w”) — this is 
step (iii). 
(viii) For non-minimal (m,n) prove .¥ (m,n) from the LH. (r,s) < (m,n) > 
F (r, 8). 
Item (viii) splits into the following cases: 
¢ m=O. Then prove .¥(0,n) from .¥(0,n —1) (why is n > 0?) — this is 
step (iv). 
¢ n=O. Then prove.¥ (m, 0) from (Vn).F (m — 1, n) (why is m > 0?) — this is 
step (v). 


¢ m>0, n>0. 


Finally prove.¥ (m, n) from.¥ (m, n—1) and (Vn).¥ (m—1, n) —this is step (vi). 
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VI.2.21 Example. It is clear now that since sets such as w — {0}, NU {—3, —2, 
—1} are well-ordered (by <), we can carry induction proofs over them. In the 
former case the “basis” case is at 1; in the latter case it is at —3. © 


We next turn to recursive (or inductive) definitions with respect to an arbitrary 
P that has IC: first one that is also an order, and then one that is not necessarily 
an order. 


VI.2.22 Definition (Left-Narrow Relations). (Levy (1979).) A relation P is 
left-narrow iff P(x) is a set for all x. It is left-narrow over A iff P| A is left- 
narrow. 

Left-narrow relations are also called set-like (Kunen (1980)). 


VI.2.23 Example. € is left-narrow, while 3 is not. 


VI.2.24 Definition (Segments). If < is an order on A anda ¢€ A, then the class 
< (a) is called the (initial) segment defined by a, while the class < (a) is called 
the closed segment defined by a. 


@5 is ra(<), of course, so that < (a) =< (a) U {a}. Segments of left-narrow 
relations are sets. © 


VI.2.25 Theorem (Recursive or Inductive Definitions). Let <: A— A bea 
left-narrow order with IC, and G a (not necessarily total) function G : A x 
Uy — X for some class X. Then there exists a unique function F : A > X 


satisfying: 
(Va € A)F(a) ~ Gia, Ff < (a)) (1) 


The requirement of left-narrowness guarantees (with the help of collection; 
cf. [1I.11.28) that the second argument of G in (1) is a set. This restriction does 
not adversely affect applicability of the theorem, as the reader will be able to 
observe in the sequel. 

Also recall that “~” is Kleene’s weak equality, so that in the recurrence (1) 
above we have either both sides defined and equal (as sets or atoms), or both 
undefined (see III.11.17). © 


Proof. The proof is essentially the same as that for recursive definitions over 
an inductively defined set that we carried out in the metatheory in I.2.13. See 
also the proof of V.1.21. 
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We prove uniqueness first, so let H: A — X also satisfy (1). Leta € A, and 
adopt the I-H. that 


(Vb < a)F(b) ~ H(b) 

that is, b <a > (Vy)((b, y) € F = (b, y) € H), and therefore 
F [< (a) = H [< (a) 

It follows that (writing “” conjunctionally) 


F(a) ~ Gaa, F fK (a)) 
~ Ga, H [< (a)) 
~ Ha) 


This settles the claim of uniqueness: (Va € A)F(a) ~ H(a), that is, F = H. 
Define now 
={f:@aeA)(f:< (a)> 


K (Wx €< (a 7G) & Gtx, FF te) = 


“f :< (a) > X” stands for “f is a function < (a) > X’. Thus, of course, 
in particular, f is a class (not an atom). By left narrowness and III.11.28, any 
such f is actually a set, so we can “define” the class term K. 

A classless way of stating (2) is to let A = {x :.4(x)}, G = {((x, y), z) : 
G(x, y, z)}, and X = {x :.%(x)}, adding the assumptions 


F(x, y,z2) > A(x) A BZ) 
and 
Gx YDAG X,Y, Z)>Z=2 


We then simply name the formula below “4(f, a)”, and let K = {f : 
(da) H(f, a}. 


AU(f) A Alay a W2)(z € f > OP(z) A m(z) < aA.2(5(2))) 
A (Wx)\Wy)\Vz(zfx A yfx > y =z) (2') 
A (Wx €< (a))(Vy)(yfx > F(x, {(u, v) : vfu Au < x}, y)) 


In the formal description of K (i.e., (2')) we have added the conjunct -U(f) 
to exclude atoms. Informally we do not have to do this, since a function f is a 
class (our only concern being whether it is proper or not). 

We also note that K4 @. For example, if a € A is <-minimal, then < (a) = 
{a}, < (a) =@,andhence f [< (a) =@ for any f; thus K contains {(a, G(a, @))} 
if G(a, @) J, else it contains the empty function @ :< (a) — X. Indeed, for the 
latter we have J(a) ~ G(a, 0 [< (a)), since both sides are undefined. © 
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Since the uniqueness argument above does not depend on the particular left 
field A, but only on the fact that < has IC over A, the same proof of uniqueness 
applies to the case that the left field is < (a) (a subset of A),' showing that 


Fyre “(f, a) “(g,a) > f=g (3) 
We have at once? 
tzAc (f, a) > (g,b) > f(x) l> gx) Lo f(x) = g(x) 3’) 


because < (x) C< (a)N < (b) by transitivity;} hence f [< (x) = g [K (x) by (3). 
Now 


F = |_)Kisa function F : A> X (4) 


FFa)=y o Gf)Ga(“H(fa A fa) = y) 


Thus, if F(x) = y and F(x) = z, then (using auxiliary constants f, g,a, b) we 
add 


f®)=yAH(f,a) (5) 
and 
g(x) =zZA.“(g,b) (6) 


from which (and (3’)) we derive y = z. 
In preparation for our final step we note that 
tzrc “M(f,a) > f =F I< (a) (7) 


Assume the hypothesis, and let x <a. Then, f(x)=y implies F(x)=y by 
(4). Conversely, under the same hypotheses — .4%(f,a) and x < a — as- 
sume also F(x) = z. This leads (say, via new constants g and b) to (6). Since 
x € <(a)N <(b), we get f(x) = g(x) = F(x) (cf. footnote related to (3’)). 


We finally verify that F satisfies the recurrence (1) of the theorem. Indeed, let 
F(x) = y, which, using auxiliary constants f and a, leads to the assumption (5) 


+ Of course, < has IC over < (a) for any a € A (cf. VI.2.12). 


if we had known that x €< (a)NM < (b), then only one of the hypotheses f(x) | or g(x) | would 
have sufficed. 
8. We have just used the assumption that < is an order. 
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above. Then, by. “4%(f, a) — specifically, specializing the last conjunct of (2’) — 
we get 


G(x, {(u,v):vfu Au <x}, y) 
Hence, 
G(x, {(u,v):vFudAu < x}, y) 
by (7). That is, 
Ezpc Fx) = y > GO, F [< (x)) = y 
We now want the converse, 
Fzrc Gx, F P< (x)= y > Fa)=y (8) 
Let a € A be <-minimal for which (8) fails. This failure means that we have 
Ga, F [< (a)) = b for some appropriate b € X (9) 
yet! 
F(a) t (10) 
We define h = F [< (a) (a renaming of convenience), which is a function. 


By minimality of a, the function F, and hence h, satisfy the recurrence (1) on 
< (a), that is 


(Vx E< (a) h(x) ~ G(x, h [K< (x)) (11) 


The function f = h U {(a, b)} satisfies (Vx E< (a)) f(x) ~ GQ, f P< (x)), 
because of (9). Hence f C F by (4). Now, f(a) = b contradicts (10). 


VI.2.26 Remark. (1) Pretending that the above proof took place in the metathe- 
ory, one can view it as “constructively” demonstrating the “existence” of a class 
F with the stated properties. Formally, we cannot quantify over classes. Thus, 
to prove “(WA)... A...” one proves the schema “... A...” for the arbitrary 
_% (that “defines” A = {x :4}). To prove “(GA)... A...” one must exhibit a 
specific formula .4 (that gives rise to A as above) for which we can prove (the 


” 


formal translation of) “...A...”. 


In particular, what we really did in the above proof were two things: 
(a) We stated a formula .F (x, y), displayed below, that was built from given 
formulas: 


(Af\Ga)\-4(f, a) A (x, y) € f) 
where .% is given by (2’). We then proved the theorems 


F(x, y)NF(x%,y)raya=y’ (*) 


t Clearly, by the direction already proved, F(a) | is incompatible with the failure of (8). 
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and (1) of the theorem, using in the latter case the abbreviation F(x) = y for 
F (x, y). This was our “existence” proof. 


© The reader will have absolutely no trouble verifying that if instead of G we 
have a function symbol, G, of arity 2, and a relation < with IC (dropping A and 
X), then we can introduce a function symbol of arity 1, F, so that the following 
holds: 


Fzec (Va) F(a) = Gta, F [< (a)) 
Indeed, F can be introduced by 


F(x)= yo Af)Gay 4 fa) A fa) =y) (#9) 


since we can prove, under these assumptions,! that 


Fzre (Wx)(Aly (A f\Ga-4(f, a) A f(x) = y) 


Even in the presence of an A we can always introduce F (using the techniques 
in III.2.4) by saying “under the assumption x € A, («) is derivable, while 
under the assumption x ¢ A, F(x) = @ is derivable” (definition by cases). (See 


Exercise VI.5.) oe 


(b) The uniqueness part showed that our “solution” .Y is unique within equi- 
valence: Any other formula .7 that is functional (i.e., satisfies («) above with 
F replaced by .#) and “solves” (1)) is provably equivalent to.¥ : 


A(x) > (Fa, y) + Hx, y)) 
The above discussion makes it clear that using class terminology and notation 


was a good idea. 


(2) The recursion on the natural numbers (V.1.21) is a special case of VI.2.25: 
Indeed, 


fO)=a 
forn>0, frn+1)= gn, f(n)) 


can be rewritten as 


(Vn € w) f(n) ~ Gn, f I< (n)) 
~ Gn, fin) 


1 G is obtained from G as in III.11.20. Conversely, starting with G = {((x, y),z) : F(x, y,z},a 
function, we have ‘¥(x, y, z) > F(x, y, 2’) > z = z’. We can then introduce G by G(x, y) = 
20 G(x,y,2. 
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where 
a ifn =0 
Gin, h)= 4 gin—1,h(n—1)) iff isa function A dom(h) =n > 0 
t otherwise 


Note that G on w x Uy is nontotal. In particular, if the second argument is not 
of the correct type (middle case above), G will be undefined. We can still prove 
that f(n) | for all n € w, without using V.1.21. 

Assume the claim form < n (1.H.). Forn = 0, wehave f(0) ~ G(0, 9) = a, 
defined. Let nextn > 0. Now f(n) ~ G(n, f [ n) and dom(f [ 2) =n by LH.; 
hence f(n) = gv —1,(f [nym — 1)) = gr — 1, f(n — 1)), defined, since g 
is total. 


(3) In view of the above, it is worth noting that a recursive definition a 
la VI.2.25 can still define a total function, even if G is nontotal. 


VI.2.27 Corollary (Definition by Recursion with Respect to an Arbitrary 
Relation with IC'). Let P : A > A be aleft-narrow relation — not necessarily 
an order — with IC, and G a (not necessarily total) function G : A x Uy > X 
for some class X. Then there exists a unique function F : A > X satisfying 


(Va € A)F(a) ~ Gia, Ff P(a)) 


Proof. Define G:AxUy> X by 


7) if f is not a function 


Ga, f)= ee f{P(a)) otherwise 


Let < stand for P*. Now < is an order on A with IC by VI.2.16. Moreover it 
is left-narrow by the axiom of union, since (V.2.24) 


Pt(a) = Me {P"(a) :n € @ — {0}} 


and an easy induction on n shows that each P” (a) is a set (Exercise VI.2). Thus, 
by VI.2.25, there is a unique F : A — X such that 


(Wa € A)F(a) ~ Gia, F f< (a)) 
~ G(a, (F t< (a)) [ P(a)) 


Now, P(a) C< (a) yields (F [< (a))[ P(a) = Ff P(a); hence (1) becomes 


() 


(Va € A)F(a) ~ Gta, Ff P(a)) 


+ This idea is due to Montague (1955) and Tarski (1955). 


© 


VI.2. Induction and Inductive Definitions 307 


VI.2.28 Corollary (Recursion with a Total G). Let P : A> A be a left- 
narrow relation — not necessarily an order — with IC, and G a total function 
G:A x Umu—-X, for some class X. Then there exists a unique total function 
F:A— X satisfying 


(Va € A)F(a) = Gea, Ff P(a)) 


Proof. We only need to show that dom(F) = A. By VI.2.27, there is a unique 
F satisfying 
(Va € A)F(@@) = Gaa, Ff P(a)) 


But the right hand side of ~ is defined for all a € A; thus we can use “=” 
instead of “~~” in the statement of VI.2.27. 


VI.2.29 Corollary (Recursive Definition with Parameters I). Let P: A> A 
be a left-narrow relation — not necessarily an order — with IC, and G a (not 
necessarily total) function G :S x A x Uy — X, for some classes S and X. 
Then there exists a unique function F :S x A > X satisfyingi 


(V(s,a) €S x A)F(s, a) & Gls, a, {(s, x, F(s, x)) : x Pa}) (1) 
© In equation (1) s persists throughout (unchanged); hence it is called a parameter. © 


Proof. Define the relation PonSxA by 
(u,a)P(v,b) iff u=vAaPb 
It is clear that P has MC. Now, (1) can be rewritten as 


(Vis,a) ES x A)F(s, a) ~ G(s, a, {(s, x, F(s, x)) : (s, x) P (s, a)}) 
o G(s, a, Ff P{(s, a))) 


The result follows from VI.2.27 by using J given below as “G-function”: 


t ifg ¢éSxA 


Ke, N= G(r(g), 6(g), f) otherwise 


t “(W(s,a) € S x A)” is argot for “(Wz)(O P(z) A m(z) € SA 8(z) € A...”, or, simply, 
“(Vs € S)(Vx € A)”. 
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VI.2.30 Corollary (Recursive Definition with Parameters II). Let all as- 
sumptions be as in Corollary V1.2.29, except that the recurrence now reads 


(Vis,a) € Sx A)F(s, a) ~ G(s, a, {(x, F(s, x)) : x Pa}) (1) 


Then there exists a unique function F:S x A > X satisfying (1). 


Proof. Apply Corollary VI.2.29 with a “G-function’, J, given by 
Js, a, f) = Gs, a, pr3(f)) 
where p23 : Uy > Uy is 


(f) = + if f is not a class of 3-tuples 
Pa {(8(r(z)), 5(z)) 2 € f} otherwise 


VI.2.31 Corollary (Pure Recursion with Respect to a Well-Ordering and 
with a Partial G). Let <: A > A be a left-narrow well-ordering, and G a (not 
necessarily total) function G : Uy — X for some class X. Then there exists a 
unique function F : A > X satisfying (1)-(2) below: 


(1) (Wa € A)F(a) ~ G(F f< (a)), 
) 


(2) dom(F) is either A, or < (a) for somea € A. 


“Pure recursion” refers to the fact that G has only one argument, the “history” 
of F on the segment < (a). 


Proof. In view of Theorem VI.2.25, we need only prove (2). So letdom(F) # A. 
Let a in A be <-minimal (also minimum here, since < is total) such thatt 


Fa)t ie, G(F [< (a)) t (3) 


Thus < (a) C dom(F). We will prove that dom(F) =< (a). Well, let instead 
b € dom(F) — < (a) be minimal.! 


By (3) and totalness of <, we have a < b. By choice of b, 
(Vx)la <x Ax <b— F(x) t) 
Thus, 
F [<(b) = Ff< (a) (4) 


+ Proof by auxiliary constant hiding between the lines. 
= Another hidden proof by auxiliary constant. 
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Therefore 


Fb) ~ GE [< (b)) 
~ GF [< (a)) (by (4)) 


contradicting (3), since F(b) J. 


ervey Example. Let G : 2 x U > 2 (recall that 2 = {0, 1}) be 


1 ifx=1Af=9 
+ otherwise 


aw. =| 


and 2 be equipped with the “standard order” < (i.e., €) on w. Then the recursive 
definition 


(Va € 2)F(a) =~ G(a, F f< (a)) 


yields the function F = {(1, 1)}, whose domain is neither 2 nor a segment of 
2. Thus the requirement of pure recursion in VI.2.31 is essential. 


VI.2.33 Remark. In practice, recursive definitions with respect to a P that has 
MC (IC) have often the form 


H(s) if x is P-minimal 


F(s,x)X ee {(s, y, F(s, y)) : yPx}) otherwise 


This reduces to the case considered in VI.2.29 with a “G-function” G given by 


~ _ J H@) if x is P-minimal 
ee G(s, x, f) otherwise 


A similar remark holds — regarding making the basis of the recursion explicit — 
for all the forms of recursion that we have considered. 


VI.2.34 Example (The Support Function). The support function sp : Uy > 
Uw gives the set of all urelements, sp(x), that took part in the formation of 
some set x. For example, 


sp) = 
sp(n) = foreveryn €@ (induction on 7) 
sp(@) =G 


sp({2, ?, {#, !, o}}) = {?, # for urelements ?, #, ! 


1 Purity of recursion we tacitly took advantage of in the last step of the proof of VI.2.31. Imagine 
what would happen if F’s argument were explicitly present in G: We would get G(b, F [< (b)) x 
G(b, F [<(a)), a dead end, since what we have is G(a, F [< (a)) t, not G(b, F f< (a)) t. 
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The existence and uniqueness of sp is established by the following recursive 
definition: 


{x} if x is an urelement 


Uf{sp(y): y € x} otherwise (1) 


sp(x) = 


That (1) is an appropriate recursion can be seen as follows: First, € (the relation, 
not the predicate) is left-narrow and has MC. Next, (1) can be put in standard 
form (Corollary VI.2.28 in this case) 


(Vx € dom(sp))sp(x) = G(x, sp [e (x)) 
(of course, € (x) = x), where the total G : Uy x Uy — Uy is given by 


{x} if x is an urelement 
Go, fpo= 4G otherwise, if f is not a relation 
Jran(f) in all other cases 


Note that in view of the discussion in Remark VI.2.26, we may introduce “sp” 
as a new formal function symbol. © 


VI.2.35 Definition (Pure Sets). A set with empty support is called a pure 
set. 


VI.2.36 Example (Mostowski’s Collapsing Function). Here is another func- 
tion on sets that is an important tool in the model theory of set theory. It is a 
function C : Uy x Uy — Uy defined by 


x if x is an urelement 


1 
{C(p, y):y€pAy€x} otherwise (1) 


C(p, x)= | 
This too can be introduced formally if desired (cf. VI.2.26). Note that p is a 
parameter. 


What does C do—i.e., what is C(p, x) —to a set or urelement x in the context 
of the “reference” set or urelement p? 


Well, if x is an urelement, then C does not change it. In the contrary case, if 
p is an urelement, then y € p is refutable and thus C(p, x) = J. 

The interesting subcase is when p is a set. Suppose that x N p = x’M p 
despite (possibly) x 4 x’. We get from (1) 


C(p, x) ={C(p, y): y € pNx} = {C(p, yy): y € pNx'} = C(p, x’) 


In other words, C “collapses” any two sets x and x’ if their (possible) differ- 
ences cannot be witnessed inside p. That is, an inhabitant of p, aware only of 
members of p but of nothing outside p, cannot tell x and x’ apart on the basis 
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of extensionality (trying to find something in p that is in one of x or x’ but not 
in the other). 


Here is aconcrete example: Let p = {#, !, ?, {#, @, ?}, {#, ?}}, where #, !, ?, 
@ are urelements, and let x = {#, @, ?} and x’ = {#, ?}. Now, 


C(p, #) =# 
C(p,!)) =! 

C(p,2) =? 
C(p, @) = @ 


C(p, x) = {C(p, y): y € pN x} 
={C(p,y): y=#V y=} = (#, 2} 


/ 
=x 


while 
C(p, x’) = {C(p, y): y € pN x} 
= {C(p,y): y=#V y=?} = {#, 2} 
= x’ 
and 


C(p, p) = {C(p, y): y € P} 
={C(p,y): y=#Vya!lVy=?vyHxvy=x'} 
= {#,!, 2, {#, 2} 


Note that—in the place of the two original x and x’ of p—C(p, p) (the “collapsed 
p”) only contains the common collapsed element, the set C(p, x) (= C(p, x’)). 
Moreover, we note that the C(p, p) that we have just computed is transitive. 
This is not a coincidence with the present p, but holds for all p: 

Indeed, if C(p, p) is not an atom (case where p is an urelement), then it is a 
transitive set (why set?). To verify, let next p be a set and (using conjunctional 
notation) 


aebeC(p, p)={C(p,x):x € p} (2) 


Then b = C(p,x) for some x € p (III.8.7). Now, one case is where x is an 
urelement, hence b = x (by (1)). Since a € Db is now refutable, a € b € 
C(p, p) > a € C(p, p) follows. 


The other case leads to 
b={C(p,y): ye px} 


By the assumption a € b,a = C(p, y) for some y € pMx C p; hence (by (1)) 
a €C(p, p). 
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We conclude by putting the recursion (1) in standard form (that of Corollary 
VI.2.29 — hence C is total on U4, since G below is). Just take as “G-function” 
G : U}, > Uw, given by 


x if x is an urelement 
re f) 7) else, if p is an atom 
? x, = . . . 
f 7) else, if f is not a relation 


ran(f [ ({p} x (pA x)))  inall other cases 


The reader will have no trouble putting future recursions in “standard” form, 
and we delegate to him all future instances of such an exercise. © 


VI.2.37 Remark. What lies behind the fact that C(p, p) is transitive, intuitively 
speaking? Well, by “squeezing out” those elements of x (in p— suchas @ above) 
which do not help to establish the “identity” of x in p, we have left in x, in 
essence, only those objects which (in squeezed form, of course) p “knows 
about” (i.e., are its elements). The collapsed p (i.e., C(p, p)) has the hereditary 
property: If x (set) is in it, then so are the members of x, and — repeating this 


observation — so are the members of the members of x, and so on. 


© VI.2.38 Example (Continuation of Example VI.2.36). We now examine 
whether the concrete p of the previous example is a possible “universe” of 
sets and urelements, where we are content to live (mathematically speaking) 
and “do set theory” (1.e., it is the underlying set of some model of ZFC).' We 
discover that this potential “universe” has a disturbing property: Even though — 
as inhabitants of p — we “know” about certain two of its members, x and x’, 
that! (Vz)(z € x < z € x’), yet it happens that “really” x 4 x’. That is, 
extensionality fails in this universe. 
Let us call a set p extensional iff it satisfies (3) below (otherwise it is called 
nonextensional): 


(Vu € p)(Wu € p)(-Uw) ee 6 Ce 


3 
(Wz € p((zeuazev) > u=0)) ie 


It turns out that if p is extensional to begin with, then by collapsing it, not only 
do we turn it into a transitive set, but also, the new set C(p, p) is essentially the 
same as p; its elements are obtained by a judicious renaming of the elements 
of p, otherwise leaving the {}-structure of p intact. 


¥ By our belief that ZFC is consistent — cf. II.4.5 — set universes exist by the completeness theorem 
of Chapter I. However, this p cannot be one of them, for extensionality fails in it. 
= Caution: Since p is (supposed to be) the “universe”, “(Vz)” here is short for “(Vz € p)”. 
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More precisely, there is a 1-1 correspondence between p and C(p, p) (x > 
C(p, x) does the “renaming”’) which preserves membership relationships (we 
get, technically, an isomorphism with respect to €). 


Let us prove this. If p is extensional, then 
Ax.C(p,x): p > C(p, p) 
is a 1-1 correspondence such that, for all x, y in p 
xey iff Cip,x)eC(p,y) (4) 
To this end, observe that Ax.C(p, x)[ p 1s, trivially, total and that 


ran(Ax.C(p, x) [ p) = {C(p, y): y € p} 
= C(p, p) 


so that Ax.C(p, x) | p is onto as well. Also observe that since 
C(p, y) ={C(p, u): ue yp} 
we have 
xEpAxey—> C(p,x)€C(p, y) 


which is half of (4). To conclude we need to show the I-1-ness of Ax.C(p, x)[ p 
as well as the if part of (4) above. 


We show that 
(Vx)(VY)O(, y) (5) 


where (x, y) stands for 


[C(p,x) =C(p,y) > x =yI]A 
[C(p,x)€C(p,y)>xeEylA 
[C(p, y) € C(p, x) > ye x] 


and quantification is over p. 
We argue by contradiction, assuming instead the negation of (5): 


(Ax )(Ay)-@(x, y) (5') 


The argument is extremely close to that of the proof of trichotomy (V.1.20). 
So, let x9 be €-minimal such that 


(Ay)-2 (x0, y) (6) 


and, similarly, yo be €-minimal such that 


= (x0, Yo) (7) 
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We will contradict (7), which says that 


(C(p, x0) = C(p, yo) A x0 # Yo) 
V (C(p, x0) € Cp, yo) A x0 € Yo) 
V (C(p, yo) € C(p, x0) A Yo € Xo) 

Case 1. C(p, xo) = C(p, Yo) AXo ¥ Yo (refutation of 1-1-ness). If either xo 
or yo is an urelement, then x9 = yo, a contradiction. Indeed, say xo is an atom. 
Then x9 = C(p, xo) = C(p, yo), which forces C(p, yo) to be an urelement, 
inevitably yo (why?). So let both xo and yo be sets. 

We will prove that x9 = yo to obtain a contradiction. Since p is extensional, 
this amounts to proving z € x) > z € yo andz € yo > Z € Xo for the arbitrary 
Z-S-p; 

To this end, let z € xo, so (by (6)), (Vy)@(z, y) holds, in particular 


OZ, yo) (8) 
By the only-if part of (4), already proved, z € xo yields C(p, z) € C(p, x0) = 
C(p, yo). Thus, by (8), z € yo. 
One similarly proves z € yo — z € xo. However, it is instructive to include 
the full proof here, so that we can make a comment. 
Let z € yo. By (7), 
OQ (Xo, 2) (9) 
By half of (4), C(p, z) € C(p, yo) = C(p, x0). By (9) z € Xo. 
Note that the inclusion of the seemingly redundant “A[C(p, y) € C(p, x) > 


y € x]” in the definition of (x, y) ensures the symmetry in the roles of x, y. 
In the absence of such symmetry, (9) would not help here. 


Case 2. C(p, xo) € C(p, yo) — hence yo is not an urelement — yet xo ¢ Yo. 
Thus 


C(p, x0) = C(p, 2) (10) 


for some z € pM yo; hence (by (7)) @’(xo, z); and (by (10)) xp = z, thus x9 € yo, 
contradicting the assumption. 


Case 3. C(p, yo) € C(p, Xo), yet yo ¢ Xo. This case leads to a contradiction, 
exactly like the previous one, establishing (5). 

Thus, if. 4 = (A, U, €) is acountable model of ZFC,‘ then (C(A, A), U, €) 
is an isomorphic (=, U and € are “preserved”’) transitive model. A so-called 
CTM (countable transitive model). Cf. I.7.12. 


+ Such models exist by the Lowenheim-Skolem theorem of Chapter I, since the language of ZFC 
is countable (granting that ZFC is consistent). 


ee 
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VI.2.39 Example (Example VI.2.36 Concluded). Let p = {#, !, ?, {#, @, ?}, 
{#, !}}. Set x = {#, @, 2} and x’ = {#, !}. This p is extensional for px 4 
px’. One easily computes 


C(p, x) = {#, 2} 
and 
C(p, x') = {#, J 


Thus the collapse of p, C(p, p), is {#, !, ?, {#, 2}, {#, U}}. 


We conclude this section with an extension of the previous recursive defini- 
tion schemata — which define one function — to the case where many functions 
are defined at once by simultaneous recursion. This tool — familiar to the worker 
in computability, where it goes back (at least) to Hilbert and Bernays (1968) — 
will be handy in our last chapter, on forcing. There are many variations that are 
left to the reader’s imagination. We just give two of the many possible schemata 
here and also restrict the number of functions that are simultaneously defined 
to just two (without loss of generality, as the reader will readily attest). 


VI.2.40 Corollary (Simultaneous Recursion with Respect to an Arbitrary 
Relation with IC). Let P : A > A be a left-narrow relation — not necessarily 
an order — with IC, and G, and Gp (not necessarily total) functions A x U4, > 
X, for some class X. Then there exist unique functions F, and F 3, from A to X, 


satisfying 
(Va € A)F\(a) = Gy (a, {(x, Fy(x)) : x Pa}, {(x, Fo(x)) : x Pa}) (1) 
and 


(Va € A)F,(a) ~ Gy(a, {(x, F)(x)) : x Pa}, {(x, Fo(x)) : x Pa}) (2) 


Proof. We define functions p; and p2 by 


+ if f is not aclass of (x, (y, z))-type 
pif) = entries 
{(z(z), w(6(z))) :z € f} otherwise 
and 
+ if f is not a class of (x, (y, z))-type 
pf) = entries 


{(r(z), (5(z))) :z € f} otherwise 
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and set 
G = aAxf.(Gi(x, pi(f), po(f)), Go(x, pif), po(f))) 
By VI.2.27 there is a unique F : A > X such that 
(Va € A)F(a) ~ G(a, {(x, F(a) : x Pa}) 


It is trivial to check (induction) that F is a class of (x, (y, z))-type entries 
(equivalently, F(a) | implies that F(a) is a pair). Taking Fj =z o F and F, = 
6 o F, we have satisfied (1) and (2) respectively. 


VI.2.41 Corollary (Simultaneous Recursion with a Total G). Let P: A> A 
be a left-narrow relation — not necessarily an order — with IC, and G, and G2 
total functions A x U4, > X for some class X. Then there exist unique total 
functions F, and Fy from A to X satisfying 


(Va € A)F (a) = Gy(a, F, [ Pla), Fo f P(a)) 
and 


(Va € A)F,(a) = Ga(a, F, [ Pia), Fy P(a)) 
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VI.3.1 Example (Informal). Consider RC A x A, where A= {1, 2, 3} and 
the relation R= {(1, 2), (2, 3), (1, 3)}. Also consider $ C B x B, where B = 
{a, b,c} and S = {(a, b), (b,c), (a, c)}. 

(A, R) and (B, S) are PO sets (indeed, WO sets). What is interesting here is 
that once we are given the first PO set, the second one does not offer any new 
information as far as partial order or, indeed, well-ordering is concerned. This 
observation holds true if “first” and “second” are interchanged. 

This is because (B, S) is obtained from (A, R) by a systematic renaming 
of objects (Lb a, 2+ b, 3+ c) which preserves order. That is, f = {(1, a), 
(2, b), (3, c)} is a 1-1 correspondence A — B such that x R y iff f(x) Sf). 

Since such a correspondence exhibits the fact that (A, R) and (B, S) 
have the same “shape”, or “form” (loosely translated into Greek, the same 
“uopen’, or “morphé” in transliteration), it has been given the standard name 
isomorphism.* 


¥ Strictly speaking, order isomorphism in this case, since the concept of isomorphism extends to 
other mathematical structures as well. The prefix “iso” in the term comes from the Greek word 
too, which means “equal” or “identical”. 
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If we have complete knowledge of (A, R) (respectively (B, S)), itis as good 
as having complete knowledge of (B, S) (respectively (A, R)). It suffices to 
study any convenient one out of many mutually order-isomorphic PO sets (or 
LO sets or WO sets). 


VI.3.2 Example (Informal). (N, <) and ({—2,—1} U N, <) are order- 
isomorphic PO sets, where the “<” is the standard order (they are also order- 
isomorphic WO sets). 

Indeed, if we let f : N — {—2, —1} UN be Ax.x — 2, then clearly f is a 
1-1 correspondence andi < j iff f(i) < f(j) for alli, 7 in N. 


Now some definitions and some useful results. 


VI.3.3 Informal Definition. Let (A, S) and (B, T) be two PO classes. A 1-1 
correspondence f : A > B is an order-isomorphism just in case 


xSy iff fix) T fo) for all {x, y} CA. 


(A, S) and (B, T) are called order-isomorphic. We write (A, S) = (B, T). 


We will drop the qualification “order-” from “order-isomorphic” as long as the 
context ascertains that there is no other type of isomorphism in consideration. 


We often abuse language (and notation) in those cases where the orders S (on 
A) and T (on B) are clearly understood from the context. We then say simply 
that A and B are isomorphic and write A = B. As usual, the negation of = is 
written $. 


A notion related to isomorphism is that of an order-preserving function. 


VI.3.4 Informal Definition. Let (A, S) and (B, T) be two PO classes, and 
f :A— B be total. If, on the assumption that x € A A y € A, the implication 
xSy > f(x)T fG), holds, then f is called order-preserving. 


VI.3.5 Remark. Operationally, “holds” above can only be certified by a ZFC 
proof (when such proof is possible). Correspondingly, the act of assuming, in 


the course of a proof, that a certain total f : A — Bis order-preserving between 
the PO classes (A, S) and (B, T) is tantamount to adding the axiom 


xe€ArnyeAsxSy-> f@a)T fo) 


© 
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If we use the more natural notation <,; and <2 for S and T respectively, then 
the above definition says that x <; y > f(x) <2 f(y) is the condition for a 
total f to be order-preserving. © 


VI.3.6 Example (Informal). Let A = {a, b, c, d} be equipped with the order 
<, so that (just) a <; b. Let B = {1, 2} be equipped with the order <» so that 
(just) 1 <2 2. 

Define f = {(a, 1), (b, 2), (c, 1), (d, 1)}. Clearly, f : A — B is order- 
preserving, since the statement 


(Vx € A)(Vy € A)(x <1 y> f(x) <i fO)) 


is true. However, note that f is not 1-1; hence it is not an isomorphism. © 


VI.3.7 Example (Informal). Let A = {a, b, c, d} be equipped with the order 
<, so that (just) a <, b. Let B={1,2,3,4} be equipped with the order 
<2 so that 1 <2 2 <z 3 <2 4 (employing “<»” conjunctionally). Define g as 
{(a, 1), (b, 2), (c, 3), (d, 4)}. Then g is order-preserving. It is also a 1-1 corre- 
spondence, but not an isomorphism, since g(c) <2 g(d) but c ¢; d. In fact, c 
and d are non-comparable under <,. © 


VI.3.8 Proposition. Let (A, <,) be a LO class, (B, <2) be a PO class, and 
f:A—>B be order-preserving (see VI.3.5 for the interpretation of these 


assumptions). Then 


(a) f is 1-1, and 
(b) f is an isomorphism between (A, <,) and (ran(f), <2). 


Proof. (a): Letx 4 yin A. As <, is a linear order, we have x <; yor y <, x; 
let us examine only the latter case. Then f(y) <2 f(x); hence f(x) € fQ), 
since the orders are irreflexive. 


(b): Let f(x) <2 f(y) in ran(f). Since this implies f(x) 4 f(y), we must 
have x # y by single-valuedness of f. Thus we will have x <; y ory <, x. 

Let the latter be the case. Then also f(y) <2 f(x); hence by the assumption 
and transitivity of <2, f(x) <2 f(x) —a contradiction. 

We conclude that x <j, y is the only possible case. Therefore we have 
established that f(x) <2 f(y) — x <i y, which, along with f being order- 
preserving, establishes f as an isomorphism of LO classes. 


VI.3.9 Remark. A function such as f is called an embedding. It embeds 
(A, <,) into (B, <2) in the sense that it shows the former to be an 
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isomorphic copy of a subclass of B (here ran( f)), where, of course, this subclass 
is equipped with the same order as B, namely <2. 
If ran( f) = B, then the embedding is an isomorphism. © 


VI.3.10 Corollary. Let (A, <,) be a LO class, B a class, and f : A> Ba 
1-1 correspondence. Define x <2 y on B by f—'(x) <; f7'(y). 
Then (B, <2) is a LO class that is isomorphic to (A, <,). 


even Remark. We say that the order <2 on Bis induced by f (and <;). © 


VI.3.12 Corollary. [f <; in V1.3.10 is a well-ordering, then (B, <2) is a WO 
class isomorphic to (A, <,). 


Proof. Let @ # X CB, and let a = min( f~'[X]). Now, x € Ximplies f~!(x) € 
f'[X]; hence a <, f~!(x); thus f(a) <2 x. That is, f(a) = min(X). 


VI.3.13 Proposition. Let (A, <) be a PO class with MC, and f :A— A be 
order-preserving. Then there isno x € A such that f(x) < x. 


Proof. Assume the contrary, and let m be minimal in B = {x € A: f(x) < x}. 
Thus, 


f(m) <m (1) 
Since f is order-preserving, (1) yields 


f(f(m)) < f(m) (2) 
By (2), f(m) € B, which by (1) contradicts the minimality of m. 


VI.3.14 Remark. Another way to see the reason for the above is to observe 
that if for any a € A 


fia <a 
holds, then 
-< ffF@) < fF@) < f@ <a 
is an infinite descending chain, contradicting VI.2.11. © 


VI.3.15 Corollary. [f (A, <)isaWO class and f : A— A is order-preserving, 
then (Vx € A)x < f(x). 
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The following two corollaries use the notion of segment in their formulation 
(see VI.2.24). 


VI.3.16 Corollary. There is no isomorphism between a WO class and one of 
its segments. 


Proof. Say (A, <) is a WO class and f : A >< (a) is an isomorphism, where 
a € A. Then f(a) €< (a), that is, f(a) < a, a contradiction. 


VI1.3.17 Corollary. Given a WO class (A, <). Ifa € A and < (a) C A, then 


there is no isomorphism f : A — < (a). 


Proof. Letb € A— < (a). Thusa < b; hence (conjunctionally) f(b) < a < b, 
contradicting VI.3.13. 


VI.3.18 Corollary. /f (A, <) isa WO class, and f : A — A is an isomorphism, 
then f = Ag. 


Proof. Let instead f(a) 4 a for some a € A. If a < f(a), then applying the 
order-preserving f—! to both sides, we get f~'(a) < a, contradicting VI.3.13. 
For the same reason, the hypothesis f(a) < a is rejected outright. 


VI.3.19 Corollary. [f (A, <,) and (B, <2) are isomorphic WO classes, then 
there is exactly one isomorphism f : A > 


Proof. Let f : A— B and g: A— B be isomorphisms. It is trivially verifiable 
that g~! o f : A— A is an isomorphism. 

By VI.3.18, g-!o f = Aq; hence f = g, since both functions are 1-1 corre- 
spondences. 


The next result shows, on one hand, that if two WO classes are not isomor- 
phic, then one properly contains (an isomorphic copy of) the other, i.e., the 
“smaller” of the two is embeddable in the “larger”. On the other hand, it shows 
that every WO class has the structure of (i.e., is isomorphic to) a segment. 


VI.3.20 Theorem. Let (A, <,) and (B, <2) be any WO classes and <, be 
left-narrow. Then exactly one of the following cases obtains: 


(a) The two WO classes are isomorphic, 


(b) (A, <,) is isomorphic to a segment of (B, <2), 


(c) (B, <2) is isomorphic to a segment of (A, <1). 
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Proof. By VI1.3.16 no two of the above three cases are possible at once. It 
remains to prove the disjunction of (a)-(c). Intuitively, we start off by pairing 
min(A) with min(B). Then we pair the “next larger” element of A with that of 
. We continue in this way until either we run out of elements from A and B 
simultaneously, or deplete A first, or deplete B first (these cases correspond to 
the ones enumerated (a)-(c) in the theorem). 

Formally now, if any of A or B is J, then the result is trivial. So let A # 
% # B, and apply the pure recursion (VI.2.31) to define the function F: A > B 
by 


(Vx € A)F(x) ~ min { y :y € B—-ran(F f< (x))} (1) 


Next, we establish that F: dom(F) — ran(F) is an isomorphism. By VI.3.8, it 
suffices to show that it is order-preserving on its domain. To see this we show 
that 


(Vy)(y € ran(F [<j (x)) > y <2 FQ@)) (2) 
Assume instead (recall that <> is total) 
(dy)(y € ran(F [<1 (x)) A FQ) <2 y) (2’) 
Because of (2’) we may add the assumption 
c € ran(F [<; (x)) A F(x) <2 ¢ (3) 


where c is anew constant. Let b €<, (x) (another auxiliary constant) such that 
F(b) = c. By (1), 


z € B—ran(F [<; (b)) > F(b) <2 z (4) 


By (1) again, F(x) ¢ ran(F [<, (x)) (in particular, y <; x > F(y) 4 F(), 
Le., Fis 1-1); hence F(x) ¢ ran(F [<, (b)) by <; (b) C< (x). 
Thus 


(b) <2 F(x) 


by (1), (4) and 1-1-ness of F (the last property sharpens ““<,” to “< ”). This 
contradicts (3) since c = F(b). We have established (2). 


By VI.2.31 we have one of 
dom(F) = A (5) 
or 


dom(F) =<, (a) for someae A (6) 
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Before proceeding we show that 


x € ran(F) > <2 (x) C ran(F) (7) 
This is trivial if ran(F) = B. Let then x € ran(IF), and take 
c € B—ran(F) (8) 


(a new auxiliary constant) such that 
C<2Xx (9) 
Let x = F(y). By (1) and (8) 
F(y) <2 ¢ 


contradicting (9). This settles (7). We immediately conclude that 


If ran(F) 4B, then ran(F) =<, (b), whereb= min{y: y € B—ran(F)}. 


Suppose now that (5) is the case. If also ran(F) = B, then we are done in this 
case. If on the other hand ran(F) + B, then ran(F) is a segment by the above, 
so we are done in this case. 


Suppose finally that (6) is the case. Thus ran(F) is either all of B or a segment, 
<2 (b). 

We will retire the proof if we show this latter subcase to be untenable: Indeed, 
the function F U {(a, b)} properly extends F, still satisfying (1) — 


Pause. Do you believe this? 


contradicting uniqueness of F (VI.2.31). 


VI.3.21 Remark. The above theorem can form the basis for the comparability 
of ordinals of the next section. Alternatively, one can prove the comparability 
of ordinals directly and derive VI.3.20 (for WO sets) as a corollary (VI.3.23 
below). We will return to this remark in the next section. 


VI.3.22 Exercise. 


(i) If <2 is known not to be left-narrow (the statement of the theorem allows 
either possibility), then how are cases (a)—(c) affected? 

(ii) Suppose that <z is left-narrow as well, and A and B are proper classes. 
What now? 
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VI.3.23 Corollary. Let (A, <,) and (B, <2) be any WO sets. Then exactly one 
of the following cases obtains: 


(a) The two WO sets are isomorphic, 
(b) (A, <1) is isomorphic to a segment of (B, <2), 
(c) (B, <2) is isomorphic to a segment of (A, <1). 
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Let (A, <) bea WO set, where A 4 Y. Let ag = min(A). If A — {ao} 4 G, then 
let a; = min(A — {ao}). In general, if A — {ao, a1,...,d,} 4 @, then define 
An+1 to be min(A — {do, aj,..., An}). 

Possibly, for some smallest n € N, A — {ao, a1,..., @,} =, and thus A= 
{do, d1,.-., dn}, so that dg < ay <-++<d). 

Another possibility, when A is (intuitively’) infinite is, that we will exactly 
need ail the natural numbers in N in order to name the positions of the elements 
of A in their (ascending) <-order; that is, A = {ao, a1,...} and day <a, <---. 

Is it possible that a WO set is so “long” that we will run out of position 
names (from N) before we run out of positions in A? The answer (affirmative) 
is straightforward: 


VI.4.1 Example (Informal). Adjoin to N — equipped with the “natural order” 
<= {(i+ 1,i):i © N} —a new object. For example, adjoin the new object N 
(‘“new” in the sense that N ¢ N) to form A = N U {N}. 

Next, extend < to <4 on A by <4 = <U{(N, 7) :i € N}. That is, i <, N for 
alli e N. 

The requirement that the object N have a position immediately after all the 
i’s in N forces us to run out of position names (supplied from N) when we are 
naming the positions of the elements of the WO set A. Object N is the first in 
A that has no position name, if the name supply is just N. 

Mathematicians use the name w (the same one used for the set of formal 
natural numbers) to name the position of N in A (that is, the first position 
after positions 0, 1,2,...). Thus A is the “ordered sequence” dg <4 a) <a 
a2 <A4°r** <Ady. 

We can carry this further. We can imagine a WO set (B, <g,) that is so long 
that it requires yet another position name after w. We call this new position w + 1, 
so that B is the ordered sequence by <p by <p bo <p <p--- <B dy <B Do 41. 


1 We are going to formalize the notions “finite” and “infinite” in Chapter VII. 
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Similarly, for still longer WO sets one invents position names w+ 2, w + 3, 
etc. 

What would be the name for the position immediately after all the ones 
named w + i (i € N)? Mathematicians have invented the name “w - 2”. 


These position names of WO set elements are the so-called ordinals (also 
called ordinal numbers). They provide (among other things) an extension of 
the position-naming apparatus that N is. 

In order to eventually come up with a well-motivated formal definition 
of ordinals, let us speculate a bit further on their nature. Extrapolating from 
the discussion of Example VI.4.1, let us imagine a sequence of position 
names 0,1,...,@,@+1,...,@-2,@-24+1,...,@-3,@-34+1,... of suffi- 
cient length so that the elements of any WO set (A, <) can fit, in ascending 
order (with respect to the WO set’s own “<”) contiguously from left to right in 
named position slots (starting with the Oth position slot). 

Once we have so fitted (A, <), let the ordinal a be the first unused position 
name. This a characterizes the “form” or “type” of the WO set (A, <), in the 
sense that if (B, <,) is another WO set such that (A, <) = (B, <), then the 
elements of B, in view of 


A:dj9<a<:-::<a,... 


B:bo<b<---<b,... 


will occupy exactly the same positions as the A-elements, and thus, once again, 
a will be the first unused position name. 


Hence, a formal definition of ordinals must ensure that they are objects of set 
theory associated with WO sets in such a way that the same ordinal corresponds 
to each WO set in a class of pairwise isomorphic WO sets. That is, one looks 
for a function ||... || , defined on all WO sets, such that 


(A, <p] = CB, <2)|| iff (as WO sets) (A, <1) = (B, <2) 
The range of ||... || will be the class of all ordinals — which turns out to be a 


proper class. © 


The above observations led to Cantor’s original definition: 


VI.4.2 Tentative Definition. (See Wilder (1963, p. 111).) The ordinal or or- 
dinal number of a WO set (A, <) is the class of all WO sets (B, <,) such that 
(A, <) = (By <), 


coy ine “permanent” definition will be given in VI.4.16. © 
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VI.4.3 Remark. The reader can readily verify that = is an equivalence rela- 
tion on the class of all WO sets. Thus, the above definition adopts (A, <)> 
[(A, <)) (recall the notation introduced in V.4.3) as the function ||... ||. 

It turns out that the equivalence classes [...]~ are too big to be sets 
(Exercise VI.7), so that they are inappropriate as formal objects of the 
theory. 


Therefore we try next 


VI.4.4 Tentative Definition. (See Kamke (1950, p. 57).) The ordinal or ordinal 
number of a WO set (A, <) is an arbitrary representative out of [(A, =): 


The new definition gets around the difficulty mentioned in VI.4.3. However, 
it creates a great sense of uncertainty with the indefinite (“an arbitrary represen- 
tative’) manner in which an ordinal is “defined”. To conclude this discussion 
that peeks into the history of the development of ordinals (mostly by Cantor), 
let us try and fix the latest tentative definition (VI.4.4) so that we can appreciate 
that the old-fashioned way of introducing ordinals could be made to work. We 
will fix the definition and follow up some of its early consequences. Once this 
is done, we will have on hand enough motivational ideas to start from scratch 
with von Neumann’s modern definition. The reader will benefit from knowing 
both points of view. 


Warning. All these tentative definitions are informal and deal with metamath- 
ematical concepts. © 


VI.4.5 Tentative Definition. The ordinal or ordinal number of a WO set 
(A, <), in symbols ||(A, <)|], is that element of [(A, <)]~ picked up by the 


principle’ of global (strong) choice. “On” denotes the class of all ordinals. 


We are not committing ourselves above to an assumption that we have strong 
or global choice, an assumption that would entail (indeed, would be equivalent 
to — see the informal discussion in Section IV.2) the well-orderability of Uy. 

The reader is strongly reminded that, until the definitive definition of ordinals 
in VI.4.16 below, all that these tentative attempts towards a definition do is to 
outline briefly the history that led to the definitive definition. For this reason, 
any auxiliary assumptions introduced to make these tentative definitions tenable 
will be discarded as soon as we reach VI.4.16. © 


+ We avoid the term “axiom”. The reason is explained in the commentary following the definition. 
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We have the following trivial consequence of this definition: 


VI.4.6 Proposition (Informal). ||(A,<1)|| = ||(B, <2)|| ff (A, <1) = (B, <2) 
for any two WO sets. 


ee! Remark. All along, when we wrote (A, R) for a set A equipped with a 
relation RC A x A, the symbol (...,...) was used informally, simply to remind 
us of the two ingredients of the situation, namely A and R (see also the footnote 
to VI.1.11). 

In instances such as Tentative Definitions VI.4.2—-V1.4.5, for example in uses 
such as ||(A, R)||, one would expect to use the formal (A, R) instead, so that the 
“pair” of A and R is an object of the theory (a set). However, we will continue 
using round brackets to denote PO sets, as we have previously agreed to do. 

Ordinals will be denoted by lowercase Greek letters, in general. Notation 
for specific ordinals may differ (see the following example). © 


co VS Example (Informal). What is ||({0, 1}, <)|], where < is the standard 
order (€) on w? According to VI.4.5, it is whichever WO set of exactly two 
elements (say, ({a, b}, {(b, a)}) for some a # b) strong AC will pick out of 
the class [({0, 1}, #)| xs We naturally use a standard name, the symbol “2”, to 
denote the ordinal of a WO set of two elements. This is summed up as 


(fa, b}, {(b, a) ll = 12, <)l| = 2 


since 2 = {0, 1}. 

Similarly, the symbol “n” (in @) will denote the ordinal ||(”, <)||, where 
again, <=e. Finally, we have already remarked that w will be the short name 
for the ordinal ||(@, €)]||. © 


Next, we consider the ordering of ordinals. 


VI.4.9 Tentative Definition. An order, <, is defined on On as follows: Let a 
and 6 be two ordinals. Then a < 6 iff a is isomorphic to a segment of £. 


VI.4.10 Remark (Informal). Recall that «a =(A, <,) and 6=(B, <2) for 
some appropriate A, B, <,, <2. Now, intuitively, (A, <;) can be embedded 
into (B, <2) as a segment iff the sequence 


B: bo <2 by <2°°: 
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is longer than the sequence 
A: ag <1 a) <1 °°: 


and hence, iff the position immediately to the right of the A-sequence is to the 
left of the position immediately to the right of the B-sequence. The italicized 
text says, intuitively, that a < 6; therefore the above definition is consistent 
with our view that the ordinal of a WO set is the position name of the first 
position to the right of the set. 


VI.4.11 Proposition (Informal). /f @ and B are ordinals, then exactly one of 
a<fp,a=8, B <a holds. 


Proof. Leta = (A, <;) and B =(B, <2). By VI.3.23, exactly one of the follow- 
ing holds: 


(4) (Ay <1) = 0B; 9), 
(b) (A, <j) is isomorphic to a segment of (B, <2), 
(c) (B, <2) is isomorphic to a segment of (A, <;). 


(b) and (c) say a < 6 and B < a, respectively, by VI.4.9. 
By (a), both (A, <,) and (B, <2) are in the same equivalence class. Since 
strong choice picks “deterministically” a unique representative from each equiv- 
alence class, and each of (A, <;) and (B, <2) is a representative, it follows that 
(A, <1) = (B, <2), Le.,a@ = B. 


VI.4.12 Proposition (Informal). On is well-ordered by “<” of VI.4.9. 


Proof. By V1.4.11, <is total. By VI.3.16, < is irreflexive. The reader can verify 
that it is also transitive (Exercise VI.8). Therefore, < is a linear order. 


Let next @ 4 A C On(A need not bea set). Leta € A. Ifa = min(A),' then 
we are done; otherwise X = {8 € A: B <a} is nonempty. Let a = (Y, <j). 
Next, if B = (Z, <2) € X, then (by VI.4.9) there is a unique 


Pause. Why “unique”? 
yg € Y such that B = (<; (yg), <1). By collection, X is a set. 


We show that there is a minimum f in X. If not, then (VI.2.13) there is an 
infinite descending <-chain in X: 


++ < B3 < Bo < B (1) 


} The term “minimum” and “minimal” are interchangeable, since < is total (VI.1.19). 
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This induces the infinite descending <,-chain in Y: 


++ <1 Vp; <1 Vp <1 YB, (2) 


where yz, is chosen as above to satisfy 


Bi = (<1(yp,), <1) (3) 


(2) contradicts the fact that (Y, <,) is a WO set (VI.2.13), and we have shown 
that X has a minimal element, as long as we manage to convince that the inequal- 
ities in (2) indeed hold. 

To this end, let 6 < y in X, where B = (Z, <2), y = (W, <3). We have 


vy = (<1 (yy), <1) (4) 
and 
B= (<1 (yp), <1) (5) 
Also, by B < y, 
BX (<3 (u), <3), whereu e W (6) 


Observe that yg # y, (otherwise, 8 = y from (4) and (5), whence B = y 
by VI.4.6, contradicting B < y (irreflexivity)). 

Assume now that y, <j, yg. Then y, that is (W, <3), is isomorphic to a seg- 
ment of (<1 (yg), <1) by (4), and therefore to a segment of (<3 (u), <3) by (5) 
and (6). That is, (W, <3) = (<3 (v), <3), where v <3 u, contradicting VI.3.16. 
Thus, yg <1 yy. 


Let then § be the <-minimal in X. We claim that 6 is <-minimal (also 
<-minimum) in A, which will rest the case. Indeed, if not, then for some y in 
A, y < B. Then y € X by transitivity of <, contradicting the minimality of £ 
in X. 


VI.4.13 Proposition (Informal Normal Form Theorem). For each a € On, 


{B:B <a}=a. 


Since we have adopted the convention that lowercase Greek letters stand for 
ordinals, we will use the shorthand “{6:...}” for “{6 € On:...}”. Also, recall 
that —fornow-—a = (A, <,) forsome A (set), so that the <;-ingredient is incor- 
porated in the notation “--- = @”. In writing “{8:...} =---”, however, we are 
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slightly abusing notation, since we ought to have written “({B:...},<)=---” 
instead, where < is the order on On defined in VI.4.9. This type of notational 
abuse is common when the order is clearly understood (this echoes the remark 


following VI.3.3). © 


Proof. Let a = (Y, <,) and X = {8:8 < q}.* As in the proof of VI.4.12, for 
each B € X we pick a yg € Y such that 


B= (<1 (yp), <1) () 


Consider the relation F = {(B, yg): B € X}. It is single-valued in yg, for if also 
B = (<1 (yg), <1) and (without loss of generality) y, <; yg, then 


(<1 (¥g), <1) = (<1 (yg), <1) 


contrary to VI.3.16. 


We saw in the proof of VI.4.12 that F is order-preserving (vy <B> y, <j 
yg); hence (by VI.3.8) 


F 
(X, <)=(ran(F), <)) (2) 


where ran(F’) C Y. Now, if y € Y and 6B = ||(<; (y), <;)||, then 6 <a by VI.4.9; 
hence 6 € X and F(f) = y. This shows that F is onto Y. 


VI.4.14 Proposition (The (Informal) Burali-Forti Antinomy). On is not a 
set. 


Proof. \n the contrary case, (On, <) is a WO set, by VI.4.12. So let 


a = ||(On, <)| 
Thus, 
(On, <) Za (1) 
By VL4.13, 
(< (a), <) Sa (2) 


(1) and (2) yield (On, <) = (< (a), <), contradicting VI.3.16. 


1 This is a different X from the one employed in the proof of VI.4.12. 
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VI.4.15 Remark. The Burali-Forti antinomy is the first contradiction of naive 
set theory, discovered by Burali-Forti (and Cantor himself). It is a “paradox” 
or “antinomy” in that it contradicts the thesis (Frege’s) that for any formula 
of set theory, .¥ (x), the class {x :.F (x)} is a set. The .¥ (x) in question here is 
“x is an ordinal”. 


Observe, on the other hand, that by VI.4.13, by the fact that any a is 
some (A, <,)forsome set A, and by collection, we have that {6B :B < a}=<(a) 
is a set for any a. In particular, this says that our tentative < on On is left- 
narrow. 


By the normal form theorem (VI.4.13), ({a:a@ < ||(A, <1)||}, <), where “<” 
is that of VI.4.9, is a member of [(A, <,)]~ for all (A, <j). 


Let us ponder then what would be the consequences if the principle of global 
(strong) choice (invoked in VI.4.5) were to be so smart as to always pick 


({a sa < ||(A, <i)II}, <) (i) 


for “||(A, <)||”, for all WO sets (A, <,). Since the order in all instances of (7) 
is the same (that is, < of VI.4.9), we could go one step further and just use the 
set {a:a@ <||(A, <;)||} as the ordinal for the WO set (A, <;), implying, rather 
than including explicitly, the order <. Of course, the sets in (7) are =-invariants 
just as they are when thought of as WO sets under <. Under the pondered 
circumstances we end up with a recurrence 


(A, <i)I| & {asa < (A, <I) 


where the a@’s are the ordinals (according to our present speculative analysis) 
assigned to the segments of (A, <;) by VI.4.9. 
The self-referential definition above, can also be written more simply as 


<(a)=a (ii) 


Let us “compute” (i.e., find which sets are) the first few ordinals. For example,' 
(@, <,) has no segments; therefore the set {a:a < ||, < )||} is empty, i.e., 


(, <;)|| = 9, the smallest ordinal 


Now, ({@}, <1) (where <, is empty as well) has one segment only, (@, <;). 
Hence, exactly one ordinal, @, is smaller than ||({@}, <,)||. Thus 


({B}, <i)Il = {9} (tii) 


¥ <<, is the empty order here. 
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Next, let us compute ||({a, b}, <2)||, wherea 4 b anda <2 b. The only segments 
are (J, <2) and ({a}, <2), which have ordinals @ and {@} respectively. 

Of course, ({a}, <2) = ({B}, <1); hence ||({a}, <2)l] = 1M}, <i] = {9} 
by (ii). Thus,’ 


II({a, b}, <2)Il = IIA, {O}}, <i] = {, {0} 


Note that for the first three ordinals, at least, their order < coincides with ¢€, 
since 


D € {B} € (9, {B}} 


This is true of all ordinals, for! 6 € a iff (by (ii)) B € <(a) iff B <a. 


Continuing our pondering on what if global choice were smart, we observe 
that each ordinal is a transitive set (Definition V.1.11). Indeed, leta € Bey. 
By the previous remark this is equivalent to a < 6 < y; hence, by transitivity 
of <,a<y; therefore a € y. Of course, an ordinal, being the set of all the 
smaller ordinals, will have as members only transitive sets. 


Von Neumann showed that, surprisingly, these transitivity properties fully 
characterize the “appropriate concept” of an ordinal as a “special set”, without 
any recourse to any form of AC — from which we disengage in the following 
“permanent” definition — and without any a priori reliance on the concept of 
well-ordering either. 


The reader is now asked to consider all the preceding attempts to get ordinals 
off the ground as “motivational discussion” with a historical flavour. Therefore 
the definitions and consequences VI.4.2—VI.4.14 are to be discarded. Our formal 
study of ordinals starts with VI.4.16 below. In particular we will show that 
ordinals (as defined below) are =-invariants. 


VI.4.16 Definition. (von Neumann.) An ordinal, or ordinal number, is a tran- 
sitive set all of whose members are also transitive sets. 

For the record, we may introduce a unary predicate “Ord” — which says of 
its argument that it is an ordinal — by the definition (1) below, where T itself is 
the unary predicate introduced by T(x) @ 7-=U(x) A (Vy € x)(Vz € y)z E x. 
T(x) says that x is a transitive set: 


Ord(x) & T(x) A (Vy € x)T() (1) 
1 < here is that of VL.4.9. 


By (ii), the only members of ordinals — in the current stage of the tentative definition — are 
themselves ordinals. Thus “x € a A x € On” — that is, “B € a” — is equivalent to “x € a”. 
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On will be the class of all ordinals; that is, On abbreviates the class term 
{x : Ord(x)}. Lowercase Greek letters will denote arbitrary ordinals — that is, 
we employ, in our argot, “ordinal-typed” variables a, 6, y,..., with or without 
subscripts or primes, a notation that we extend to unspecified ordinal constants. 

Thus in instances such as “...a@...” we will understand more: “...@Aa@ € 
On...”. Of course, specific ordinals (i.e., specific ordinal constants) may have 
names deviating from this rule (e.g., 4 in the lemma below). 


We now embark on developing the properties of ordinals. 


VI.4.17 Lemma. On # 0. 


Proof. § satisfies Definition VI.4.16; therefore, 4 € On. 


VI.4.18 Example. Here are some more members of On, as the reader can 
readily verify using VI.4.16: {0}, {O, {O}}, {W, {A}, {W, {O}}}. 

Indeed, every natural number n, and the set of natural numbers w, are ordinals 
by V.1.12—V.1.13. 


The definition of ordinals does not explicitly state that the members of an 
ordinal are themselves ordinals. The following lemma says that. 


VI.4.19 Lemma. If x € a@, then x € On. 


The above statement can be rephrased in a number of ways: “On is a transitive 
class”, or “a C On for all a”, or “every member of an ordinal is an ordinal”. This 
last formulation coincides with the results of our earlier informal discussion. © 


Proof. Let 
yexea (1) 


By V1.4.16, y € a. Therefore (VI.4.16), y is a transitive set. By VI.4.16 and (1), 
x is transitive. Since so is its arbitrary member y, x is an ordinal. 


VI.4.20 Corollary (Normal Form Theorem). {6:8 € a} =a. 


Proof. For any set y, y = {x : x € y}. In particular, a = {x:x € a}. By VL4.19, 


Fore xX EAS xXEaAAXEOn 


© 


e° 
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Hence, a={x:x € aAx€On}. Thus, using the notational convention of 
VI.4.16, we may write a = {B: 8 € a}. 


Another way to say the above is “e(a) = a”. 
VI.4.21 Theorem (Burali-Forti Antinomy). On is not a set. 


Proof. Suppose On is a set. By VI.4.19 it is an ordinal; hence On € On, which 
is impossible by foundation. 


VI.4.22 Lemma. € restricted on On is a partial order with MC. 


e”, as the context hopefully makes clear, is here the relation defined by the 
predicate “e”. We should not need to issue such warnings in the future. 


Proof. That € has MC on On is trivial, since it does so on Uy (foundation). 
Next, x ¢ x for all sets, so that € is irreflexive. Finally, a € B € y implies 
a € y by VI.4.16; hence € is transitive on On. 


VI.4.23 Theorem. € well-orders On. 


Proof. By V1.4.22, it suffices to show that € is total on On. To this end, let 
F(a, B) standfor ae Bva=BvBeEa (0) 
We will show that 
(Wa)(VB)A(a@, B) () 


where, of course, quantification is over On, as the “typed” variables @ and B 
make clear (cf. VI.4.16). We can prove (1) by €-induction or, equivalently, 
by €-MC. We do the latter. So let the negation of (1) hold (i.e., we argue by 
contradiction), that is, we add the assumption 


(Aa)(AB)-A(@, B) (2) 
By €-MC on On, let a be €-minimal such that 
(AB)-P(a@0, B) (3) 


Next, let By be €-minimal such that 


aAP(ao, Bo) (4) 


© 


© 
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Let now 

Y € Bo (5) 
Then A(ag, y) by (4) and €-minimality of By. A (ao, y) (by (0)) yields one of: 
Case 1. a = y. That is, (by (5)) ap € Bo, contradicting (4). 
Case 2. ag € y. By (5) and transitivity of Bp, again ag € Bo; unacceptable. We 


must therefore have 
Case 3. y € Qo. 


Thus, by (5), 
Bo & a (6) 
Next, let 
5 € a9 (7) 


Then €-minimality of ao, and (3),' yield A(6, Bo). The latter yields in turn 

(by (0)) 

Case 1. 6 = Bo. That is, (by (7)) Bo € ao, contradicting (4). 

Case 2. By € 6. By (7) and transitivity of wo, again By € ao; unacceptable. This 
leaves 

Case 3. 5 € Bo. 


Hence ap © fo, which along with (6) yields a) = Bo. This contradicts (4). 


The above proof is essentially a duplicate of that for the trichotomy of € 
over the natural numbers (V.1.20), although here it covers a wider context. The 
reader may also wish to compare the above proof with that in Example VI.2.38. 


VI.4.24 Corollary. x € On iff x is a transitive set that contains no atoms as 
members, and € well-orders x. 


Proof. Only-if part. By V1.4.19, an ordinal x satisfies x C On. Thus, the restric- 
tion of the well-ordering € of On on x well-orders the latter. Moreover, by 
VI.4.16 no atom is an ordinal; thus y € x > —U(y). 

If part. Let x be a transitive set that contains no atoms, and let the restriction 
of € on x be a well-ordering. Let y € x. First off, -U(y). 

We need only show that y is transitive. To this end let, in conjunctional 
notation, 


ueve y(Ex) (1) 


+ ~(B)7A(6, B) follows; hence (VB) P(6, ); thus (5, Bo) by specialization. 
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Applying transitivity of x twice, we getinturnv € x andu € x. Thus {u, v, y} © 
x. Since € is a well-ordering on x, it is also transitive on x. Hence, (1) yields 


that uv € y. But then y is a transitive set. 


© It is a trivial observation that the corollary above goes through even if well- 
order(ing) were relaxed to total order(ing), as the reader can readily check. 
However the above “redundant” formulation is necessary if one desires to found 
the notion of ordinal in the absence of the foundation axiom (that axiom was 
used in VI.4.23 in an essential way). In that case one takes the statement of 
Corollary VI.4.24 as the definition of ordinals. 


In this discussion, enclosed between double “dangerous turn” road signs, we 


digress to peek into this possible avenue of founding ordinals. This discussion 
is only of use in the proof of the consistency of foundation with the remaining 
axioms of ZFC, and can otherwise be omitted with no loss of continuity. So 


we 


temporarily suspend here (i.e., until the end of this “doubly dangerous” 


material) the axiom of foundation, and define: 


VI.4.25 Alternative Definition (Ordinals in ZF — Foundation). x is an or- 
dinal iff it is a transitive atom-free set that is well-ordered by ¢€. The notational 
conventions of VI.4.16 will apply; in particular, we continue using the symbol 


6) 


() 


(2) 


(3) 


n” for the class of all ordinals. 


We have the following consequences, in point form: 


a €¢ a for all ordinals. (Careful here! We cannot rely on foundation to say 
that € is irreflexive.) Indeed, let 


aea (i) 
Since (the right hand) a is (well-)ordered by e, 
€ |a@ is irreflexive. (ii) 


By (i), the left a is a member of the right a, so that (ii) yields a ¢ a (these 
are two copies of the left ~). We have just contradicted (i). 

x € a € On > x € On. Assume x € a € On. As in the proof of the if 
part of VI.4.24, x is a transitive set. By transitivity of a, x C a. At once we 
obtain that x is atom-free, since a is. Moreover, x is well-ordered by €, an 
order inherited from a. Thus, the alternate Definition VI.4.25 too implies 
that On is transitive, that is, ordinals only contain ordinals as members. 
acB-ae B. Assumea C Bf, and let y € 6 — a (set difference) be 
€-minimum (we say minimum rather than minimal because € is total on 8). 
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Therefore, if 5 € y; then d ¢ 6 —a. On the other hand, 6 € £ by transitivity 
of 6. Thus 6 € a; hence 


y ca (iii) 
Next, let 
5€a (iv) 


Since a C B, we have 6 € 6. Moreover, y € B as well; hence (since € is 

total on 6) we have three cases: 

Case 1. y = 6. This is untenable due to y ¢ @ and (iv). 

Case 2. y € 6. This is impossible as well, as it yields y € a by transitivity 
of a and (iv). This leaves 

Case 3. 6 € y. Along with (iv), this yields a C y; hence a = y by (iii). 

We conclude (using =, €, C conjunctionally) that 


a=ye(P—a)cp 


In short, a € B. 

(4)a=BVvVaeBvBea. Leta#~f.Observethatan 6B CaandanpCB. 
Also, a f is transitive (verify) and well-ordered by € (as a subset of a) 
as well as atom-free (hence an ordinal). By hypothesis, one of the two 
inclusions (C) must be proper (C), in which case the other is equality. 
Indeed, if they are both proper, then, by (3),aN 6B € a andan 6 é€ £; 
henceaM B € aN B, contradicting (1). Say,aM B = a,i.e.,a@ C B. Since 
we have assumed a # 8, (3) yields a € B. 

(5) a = {6:8 € a}. By (2). 

(6) On is well-ordered by €. We need only establish €-MC on On. So let 9 # 
A C On, a subclass of On. Leta € A. IfaNn A = G, then € (a) NA = G, 
that is, w is €-minimal in A. If nowaM A # QG, then let 6 be €-minimal in 
aA (aM A isasubset of the WO set (a, €)). We argue that 6 is €-minimal 
in A. If not, let y € A be such that y € B. Then y € a@ by transitivity of a; 
hence y € af A. This contradicts the choice of B. 


This approach (sans foundation) may be considered attractive in that it re- 
lies on fewer axioms than that of VI.4.16. We are nevertheless committed to 
having foundation (which has already provided us with some interesting re- 
sults in VI.2.38) and therefore will continue our development based on Defini- 


tion VI.4.16. Oe 


VI.4.26 Definition. As is normal practice, we will often utilize the symbol “<” 
for the well-ordering € | On. Thus a < 6 means exactly a € B. 
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VI.4.27 Lemma. The reflexive closure, ron(<) =<, of < coincides with C on 
On, i.e, < = Con On. 


Proof. Leta < B. This means thata = 6 ora é€ fB. In the former case, 
a C B® is immediate. In the latter case it follows from the transitivity of 6 
EE er EAB), 

Conversely, let a C B. Since we want a = 8 Va € B, by VI.4.23 we only 
need to discredit the hypothesis 6 € a. Well, that hypothesis, along with the 
original hypothesis, leads to B € B. 


VI.4.28 Example. We already know that J € On. J is the <-minimum element 
of On, since for any a, @ C a@ translates to 6 < a. 


VI.4.29 Remark. By VI.4.20, a = {8:6 < a}, ora=<(q), ie., an ordinal 
is the set of all smaller ordinals. Thus, Definition VI.4.16, offered without the 
assistance of either AC or the concept of well-ordering, yields formally the 
property (ii) of ordinals that we had arrived at in our wishful motivational 
discussion prior to VI.4.16. The next result will establish that ordinals are =- 
invariants of WO sets. Thus we have come now full circle, and all the pieces of 
the puzzle fit. 


VI.4.30 Theorem. Let (A, <,) be a WO set. Then there is a unique ordinal « 
and a unique isomorphism 4 for which 


ga 
(A, <)) = (a, <) 


where < is the standard order € on On. 


Proof. By V1.3.20, and for the WO classes (A, <;) and (On, <),! we have these 
three alternatives: 


(1) (On, <) is isomorphic to a segment of (A, <,). This is impossible, because 
collection would force On to be a set. 

(2) (On, <) = (A, <,). Untenable, as in (1). 

(3) So it must be that (A,<,) = (< (@),<) forsome a. By VI.4.20 (cf. VI.4.29), 
this says 


$a 
(A, <1) =(@, <) (i) 


¥ Note that both <; and < are left-narrow, the former because A is a set, the latter by VI.4.20. 
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for some ¢,4. Suppose that (A, <,) = (8, <) as well, where (without loss 
of generality) B < a,ie., B € a. By (i), (a, <) = (B, <); hence (a, <) = 
(< (6), <) contradicting VI.3.16. This settles uniqueness of a. 


By V1.3.19, d4 is unique. 
VI.4.31 Corollary. (a, <) = (6, <) iffa = B. 


Proof. The if part is trivial. The only-if part was proved in the course of the 
above proof (uniqueness of @). 


VI.4.32 Remark. (1) We can prove VI.4.30 without recourse to VI.3.20 by 
defining 4 on A by <,-recursion as follows:i 


oba(a) = {ba(b):b <, a} forallae A 
or 


bala) = gal<i (a)] () 


The reader is asked to pursue this. Once successful, one can turn to prove VI.3.23 
(for WO sets) using the theorem on the comparability of ordinals. 


It is important to note that (1) is the only possible definition for ¢,, for it is 
tantamount to the requirement that 6, be an isomorphism, i.e., that 


bab) € bala) iff b<,a 


(2) Whenever (A, <,) = (@, <), We can intuitively think that the members 
of a serve as “indices” (or position names) for the ordering of the elements of 
A in ascending order. a is the first index not needed in this “enumeration” of 
A. Compare this remark with the discussion that launched this section. 


(3) With the normally accepted abuse of notation, we often write a = 6B to 
mean (a, <) = (8, <); thus Corollary VI.4.31 can be also stated as “a = 6 iff 
a=p”. 

(4) Rephrasing VI.4.30, we can state that every [(A, <,)]~ contains a unique 
ordinal WO set, (a, <). We can then re-introduce the symbol ||(A, <;)|l, for- 
mally this time, without any reliance on any form of AC, to mean (a, <), where 
(A, <1) = (a, <). Actually, the following (final) definition picks just the ordinal 
a rather than the WO set (a, <). © 


1 We assume without loss of generality that < has A as its field; therefore we do not need to add 
“Ab € A” to the condition b <, a. 
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VI.4.33 Definition (Order Types of WO Sets). For any WO set (A, <}), the 
symbol ||(A, <,)|| stands for the unique a which by VI.4.30 satisfies 


(A, <1) = (@, €) 


a is called the order type of (A, <1). If A is a set of ordinals and <; = €, then 
we use the simpler notation ||A|| = @ (rather than ||(A, <j)|| = @). 


VI.4.34 Corollary. ||a|| = a. 


Proof. (a, <) = (a, <) and VI.4.33. 


VL4.35 Corollary. (A, <1) = (B, <2) iff (A, <1)|| = |(B, <2) Il. 


Proof. If part. (A, <1) = |\(A, <1) || = |(B, <2)|| = (B, <2). 
Only-if part. |\(A, <1)l| & (A, <1) Z (B, <2) & ||(B, <2)|| and VL4.31. 


VI.4.36 Example. Let # 4 A be any class of ordinals (not just a set). Then (A 
is a set. We will argue that it is an ordinal, indeed the smallest (<-minimum) 
ordinal in A. 


Leta € A. Since () A C a@, (]A is well-ordered by < (i.e., €), and since 
none of @ contains atoms (by VI.4.16), nor does () A. Using VI.4.24, we need 
only show that () A is transitive in order to conclude that it is an ordinal. 


So let B € a € (A. Thus, y € A > B Ea e€ y. By transitivity of y, 
yEA— Bey;hence BEA. 


Next, 
aeA> () Aca (1) 
translates to (VI.4.27) a € A > []A < a. Thus, it only remains to prove that 
((a4)eA 
or that some among the inclusions (1) is an equality. If not, by V1.4.27 
aeA> ( A) ea 


Hence (| A € () A, a contradiction. 
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VI.5. The Transfinite Sequence of Ordinals 


Ordinals have essentially been introduced in order to extend the natural number 
sequence, so that one can index, or number, elements of arbitrary WO sets. 

Also, in much the same way that the natural numbers are employed to label 
steps, Or Stages, in a mathematical construction (by a recursive definition over 
@), ordinals can be used for the same purpose whenever the natural number 
sequence is not “long enough” for the labeling, or, in the case of a mathemat- 
ical construction, whenever the stages of the construction are too many to be 
labelled by natural numbers. In this section we learn how to count (or how to 
sequence) with the help of ordinals, and find that ordinals are naturally split 
into three mutually exclusive types, namely, 0, limit ordinals, and successor or- 
dinals. In the light of this classification we revisit, or rephrase, the principles of 
induction and recursive (inductive) definitions, already studied in Section VI.2 
for arbitrary WO classes, as these principles apply over On or over an arbitrary 
a. We will apply both induction and inductive definitions over On, under their 
new guise, in the next section to formally construct “all sets” starting from the 
urelements, and to study their properties. In particular, we will find there that 
the vague principle of “set formation by stages” — on which we have based 
the discovery, and “truth”, of all the ZFC axioms except that of collection and 
infinity — can be made precise with the help of ordinals. 

We recall here Definition V.1.1 of the set successor operation x U {x} on any 
set x, and the notational convention established in V.1.19. 


VI.5.1 Definition (Successor on On). The successor operation on sets x is 
defined by x U {x} and is generally denoted by S(x). 
If x € On, then x + 1 is a preferred synonym for S(x). 


The notation a + | for ordinals is consistent with that for the natural numbers 
(V.1.19). However, unlike the special case of natural numbers (which are “finite 
ordinals”), where n + 1 = 1 +n is provable for the free variable n over w 
(cf. V.1.24), it is not the case that a + 1 = 1 + a is provable in general. For 
example, we will soon see that Fzpc 1+ @ 4 w+ 1. Of course, we have not 
yet said what “1 + @” ought to mean in general, but that will be done soon. 

We also recall here the result of Lemma V.1.9 and the fact (see remark prior 
to the proof) that 


Fzpc S(x) = S(y) > x = y 
for free x and y, not just for variables restricted over w. In particular, 


Hypcatl=B+l>a=8 O® 
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VI.5.2 Example (What If?). Let us prove (1) above again, this time without 
using foundation (which was used in V.1.9), but instead taking as independently 
given the fact that <, that is €, on On is a total order (for example, this would 
have been the avenue taken if we were to omit the axiom of foundation and 
define ordinals as in the alternate definition VI.4.25). 

Under such restrictions we still have trichotomy (see p. 336, item (4)), i.e., 
we have one of a = 6, a € 6, B € a.Soassumea + 1 = 6 + 1, and let 


aeB (2) 


Now, the hypothesis a U {a} = 6 U {6} implies (via D>) that B € a or B =a, 
either of which in turn implies, along with (2), that 6 € 6, contradicting the 
irreflexivity of the order € on On. Similarly, 6 € q@ is untenable. This leaves 


a= fp. 


VI.5.3 Lemma. a + 1 is an ordinal. 


Proof. Lety €a+1=aU {a}. Then y € a or y =a; hence y € Onin either 
case. 

So every member of a + | is transitive. Next, x € y € a + 1 implies (as 
above) the casesx € ye aandx ey=a. 

In either case x € a; hence x € a+ 1. Thus @ + | is transitive too. 


VI.5.4 Lemma. fzpca <Boat+1< Bf. 


Proof. <—: LetaU {a} C 6 (translating “<” to “C” via VI.4.27). Thena € B. 
—: Leta ¢€ B.Bytransitivity of B,a C B, which along with the hypothesis 
givesa U {a} C B. 


As a special case, we have Fzpc n < m <> n+ 1 <™m, where n, m are natural 
number variables. 


VI.5.5 Lemma. fzpca<Boatl< f+. 


Proof. >: Leta <B.By VI5.4,a+1<f.ButB<6+1(ie., BEBU {B}); 
hence a + 1 < 6 + 1 (this formula simply says “6B < 6+ 1” ifa+1= 6; 
otherwise it follows from the transitivity of < on On). 

<: Leta+1 < B+ 1. Possible conclusions are 6 < a, B = a, or 
a < B. The first option gives 6 + 1 < a+ 1 (by the —-direction) and hence 
a+1<a+1 by the hypothesis and transitivity, contradicting irreflexivity 
of <. The second option gives (Leibniz) 6 + 1 = a + 1, again contradicting 
irreflexivity along with the assumption. It remains to take a < B. 


© 
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As a special case, we have Fzpc 1» < m <> n+1 < m-+1, where n,m are 
natural number variables. © 


VI.5.6 Lemma. a + 1 + 9%. 


Proof. aeat+l. 


VI.5.7 Remark. Lemma VI.5.6 generalizes the case of natural numbers (V.1.8) 
to all ordinals. 

From now on we will freely use the symbol 0 for % in all contexts where 
the latter is thought of as an ordinal rather than just the empty set, since 0 is 
the symbol we have assigned to the smallest ordinal — the natural number 0 


(V.1.19). & 


As wis an inductive set, indeed the C-smallest inductive set (V.1.5), it follows 
that whenever n € w then alson + 1 € w; thus we cannot “reach” w if we start 
at 0 and keep applying the successor operation. This makes w a limit ordinal. 


V1.5.8 Definition. A limit ordinal is an a such that 


(1) a £0, and 
(2) whenever 6 € a, thenalsoB +1 ea. 


The notation “Lim(@)” says “a is a limit ordinal’. 


An ordinal @ is a successor ordinal, or simply successor, just in case a = 
6 + 1 for some f. 


(1) Some authors allow 0 to be a limit ordinal (e.g., Jech (1978b)). 

(2) Rephrasing VI.5.8, we can say (by V.1.1) that Lim(q@) iff @ is an inductive 
set. 

(3) Every n € w such that n ¥ 0 is a successor ordinal (V.1.10). So successor 
ordinals exist ({@} and {@, {@}}, or 1 and 2, are two specific examples). 

(4) Ifa = 6 +1, then 6 is uniquely determined by a (by (1) on p. 340). We 
use the notation 6 = a — | or B = pr(qa) and call 6 the predecessor of a 
(cf. V.1.15), but will not bother to introduce pr on On formally. © 


VI.5.9 Proposition. An ordinal is a successor ordinal iff it is a successor Set. 


Proof. The only-if part is trivial. 


Tf part. Say x U {x} = a. Then x € a; hence x = 6 for some B. 
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VI.5.10 Proposition. Limit ordinals exist. 
ke enslsten: There is a set that is a limit ordinal. © 


Proof. By the axiom of infinity (V.1.3), it has already been established that w 
is a Set that is a limit ordinal. 


VI.5.11 Theorem. Every a falls under exactly one of the following cases: 


(1) a=0, 
(2) Lim(@), 


(3) @ is a successor. 


Proof. First, the cases are mutually exclusive. Indeed, (1) excludes (2) by def- 
inition, while it excludes (3) by VI.5.6. We verify that (2) excludes (3). Say 
Lim(a), yet a = 6 + 1 for some B. Then 6 < a and, by (2), 68 + 1 < a—-ie., 
a < a@—acontradiction (see also V.1.14). 


Next, let a # 0 and also —Lim(q@). Therefore, by trichotomy, for some 
B <awehavea < 6+1. By VI.5.4 and <= C on On, this yields a = 6 +1, 
thus @ is a successor. 


Since On and any @ are well-ordered by <, the results on induction and in- 
ductive definitions presented in Section VI.2 carry over with minor translations: 


VI.5.12 Theorem (Induction over On on Variable «). To prove (Va).F (a) it 
suffices to prove, for arbitrary a, that (a) follows from the induction hypo- 
thesis (VB < a).F (B). 


Of course, “(Va)” means “(Va € On)” (VI.4.16), while “for arbitrary w” means 
that w is a free ordinal variable. © 


VI.5.13 Theorem (Induction over 6 on Variable a). To prove (Wa < 5).F (a) 
it suffices to prove, for arbitrary a < 6, that. (a) follows from the induction 
hypothesis (VB < a)F (B). 


The general case of inductive definitions (VI.2.25) becomes: 


VI.5.14 Theorem (Recursive or Inductive Definitions over On). Let G be a 
(not necessarily total) function G : On x Uy — X, for some class X. Then 
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there exists a unique function F:On > X satisfying 


(Va)F(@) ~ Ga, Ff a) 


Proof. Recall that <(@) = a. In particular, this makes < over On left-narrow. 


VI.5.15 Theorem (Recursive or Inductive Definitions over 5). Let G be a 
(not necessarily total) function G:5 x Uy — X, for some class X. Then there 
exists a unique function F :5 — X satisfying 


(Va € 5) F(a) ~ Ga, Ff a) 


Particularly useful is the translation of Corollary VI.2.31 in the context of 
On. We obtain two corollaries: 


VI.5.16 Corollary (Pure Recursion over On). Let G be a (not necessarily 
total) function G:Uy — X, for some class X. Then there exists a unique 
function F:On > X satisfying 


(1) Wo) F(a) ~ GT a), 


(2) dom(F) is either On, or some a. 


VI.5.17 Corollary (Pure Recursion over 5). Let G be a (not necessarily total) 
function G:Uy — X, for some class X. Then there exists a unique function 
F:6 > X satisfying 


(1) (Va € 5)F(a) ~ G(F [ a), 
(2) dom(F) is either 6, or some a < 6. 


Theorem VI.5.11 leads to some additional interesting formulations of induc- 
tion and inductive definitions over On (or over some 4). 
VI.5.18 Theorem (Induction over On Rephrased). Let S € On satisfy 


d)0eES, 
(2) WayaeSra+l1eS,), 
(3) whenever Lim(q), the hypothesis (VB < a)B € S implies a € S. 


Then S = On. 


Proof. Let instead S # On, and let a be the minimum element in On — S. 
By VI.5.11, a must be one of 0, a successor, or a limit ordinal. 
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Well, a 4 0 by (1). Next, a is not a successor either, for otherwise a — 1 € S, 
hence a € S by (2). 


Thus, perhaps Lim(a). If so, by minimality of a, (VB < a)B € S. By (3) 


this entails « € S, once more contradicting the choice of a. So the hypothesis 
S 4 On is untenable. 


© Of course, the above can be rephrased in terms of a formula.“(@). The reader 
will easily carry out this translation by letting S = {a :./(a)}. © 


We offer two reformulations of inductive definitions over On, leaving the 
untold variations to the reader’s imagination. 


VI.5.19 Theorem (Recursive or Inductive Definitions over On Rephrased). 
Let G and H be (not necessarily total) functions On x Uy — X and On x 
X— X respectively, for some class X. Then there exists a unique function 


F:On > X satisfying: 


(1) F(O) = x (for some x € X), 
(2) (Va)F(a + 1) x Ha, F(@)), 
(3) (Wa)(Lim(w) > F(a) ~ G(a, Ff @)). 


Proof. Define G:On x Uy > Xby 


og - Gia, f) if Lim(a) 
Ga, f) = H(a—1, f(a—1)) if dom(f) Da A a is a successor 
0 otherwise 


Thus, by VI.5.11, (1)-(3) translate to (Va)F(a) ~ Gia, Ff a). 
Note that by III.11.4 it is not necessary to add in the third case above “A f 
is a function”, for dom(f ) makes sense regardless. 


VI.5.20 Theorem (Pure Recursive Definitions over On Rephrased). Let G 
and Tl be (not necessarily total) functions Uy — X and X —> X respectively, 
for some class X. Then there exists a unique function F :On > X satisfying: 


(1) F(O) = x (for some x € X), 

(2) Wa)F(a + 1) ~ H(F(@)), 

(3) (Wa)(Lim(@) > F(a) ~ G(Ffa)), 
(4) dom(F) is either On, or some a. 


Let us now probe further into the ordinals. 
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VI.5.21 Example. (Refer to Example VI.2.34.) The support function x bh» 


sp(x) is defined on all sets by the recursive definition‘ 


sp(x) = |_J spy) 


yex 


() 


By induction over On, assume for all 8 < a@ that sp(B) = 0. Thus, by (1), 


sp(a) = Wee sp(B) = 0. In sum 
Fzec (Va)sp(a) = 0 


i.e., all ordinals are pure sets. 


In view of the fact that the successor operation, +1, is inadequate to “reach” 


limit ordinals, we search for more powerful operations 


VI.5.22 Theorem. Let A C On be a nonempty set. Then 


(1) UA is an ordinal, 
(2) a <A foralla € A, 
(3) UA is the least ordinal with property (2). 


Proof. (1): Let x € y € UJA. Thus x € y € a, for some a € A. Therefore 
x € a, and hence x € (J A, proving that |) A is transitive. On the other hand, 
from the above, since y was arbitrary, we have that every element of |) A is an 


ordinal. This settles (1). 
(2): This translates to a C L A, which is trivial. 


(3): Finally, let (Va € A)a < 6 for some 6. That is, (Wa € A)a C 45; hence 


UACS. 


VI.5.23 Definition. Let < be a partial order on X, and 


BCX, 


(1) a € Xis an upper bound of B iff (Wb € B)b < a. It is a strict upper bound 


iff (Vb € B)b <a. 


(2) c € X1sa least upper bound, or supremum, of B iff it is an upper bound of 


and for any upper bound a of B we have c < a. 


© VI.5.24 Remark. 


(1) Suprema are unique, for if c and c’ are suprema of 
and hence c = c’ by antisymmetry. We write c = 


,thenc <c’andc’ <c, 


sup(B) or c = lub( 


). 


Upper bounds with respect to the inverse order > are called lower bounds 


1 sp(x) = {x} on the assumption U(x). 
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with respect to <. Correspondingly, least upper bounds with respect to > 
are called greatest lower bounds or infima (singular infimum) with respect 
to <. The latter are also unique, and we write d = inf(B) or d = glb(B) to 
indicate that d is the infimum of B (in (X, <)). 

(2) If B = Q, then any a € X is a lower bound, strict lower bound, upper 


bound, and strict upper bound of B. Thus, the empty set has a supremum in 


X iff X has a <-minimum element (which, of course, is unique if it exists). 
Similarly, the statement “J has a glb” is equivalent to “X has a <-maximum 


element”. 


With the notation just introduced, Theorem VI.5.22 yields 
VI.5.25 Corollary. Let A C On, A a set. Then On > sup(A) = UJ A. 


Note that above, consistently with Remark VI.5.24, A = 4 implies that its sup 
in On is 0. In other words, sup(A) = [J A is valid for A = @. 


VI.5.26 Corollary. Let § 4 A C On, A a set. Then 


(1) The smallest ordinal strictly greater than all ordinals in A is sup{a+1:a € 
A}, denoted by sup* (A). 

(2) If A has a maximum element y, then sup(A) = y and supt(A) = y + 1. 

(3) If A does not have a maximum, then sup(A) = sup*(A). 


Proof. (1): By collection (cf. III.8.9), B = {a+1:a € A}isaset. By VI.5.22, 
supt(A) = sup(B) = [J B is the smallest ordinal such that 


a +1 <supt(A) (i) 


for alla € A. But (i) is equivalent to wa < supt(A). 


(2): Since a < y foralla € A, y is anupper bound of A; hence sup(A) < y. 
But y € A as well; hence y < sup(A). So sup(A) = y. For the rest, y + 1 
is trivially a strict upper bound of A. Let also 6 be a strict upper bound. In 
particular, y < 6 (since y € A); hence y + 1 < 6 by VI.5.4, establishing 
y +1=supt(A). 

(3): sup(A) is smallest satisfying 


a < sup(A) (ii) 


“eo 


foralla € A, the “<” becoming “<” due to the absence of a maximum element. 
But sup* (A) also is smallest that satisfies (i), by (1) (regardless of the issue of 
maximum). Hence sup(A) = supt(A). 


© 
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If 6 # A C On does not have a maximum, then sup(A) is a limit ordinal (see 
Exercise VI.14). © 


VI.5.27 Definition (Transfinite Sequences). A transfinite sequence is a func- 
tion f such that either dom(f) = On or dom(f) = a and w < a. In the former 
case it is also termed an On-sequence, in the latter case an a-sequence. 


As is usual, we think of f as the “sequence” (f(6))s<a — or (f(5))scon 
if appropriate — and even write (fs)s<q or (fs)scon- In the latter notation the 
argument 5 becomes an index (or subscript), and fs is the term of the sequence 
at location (position) 5. The concept of transfinite sequence derives from that 
of the familiar finite sequences (case where a < q@) and infinite sequences 
(case where a = w). Intuitively, a transfinite sequence is just too “long” to 
be enumerated by natural numbers, and therefore it requires ordinals beyond 
natural numbers as indices for its terms. 


VI.5.28 Informal Definition. A total F : A — B, where each of A, B is equip- 
ped with a partial order, <;, <2 respectively, is: 


(1) Non-decreasing or monotone just in case F(x) <2 F(y) whenever x <,; yin 
A. If <; is € and A = a, then F is an ascending sequence. The terminology 
derives from F(0) <2 F(1) <2 FQ) <2---. 

(2) Increasing just in case it is order-preserving in the sense of VI.3.4, that is, 
F(x) <2 F(y) whenever x <; yinA. 

(3) Continuous just in case, for each nonempty set S C A, if t = sup(S), then 
sup(F[S]) exists and F(t) = sup(F[S]). 

(4) Countably continuous just in case for each ascending sequences: w — A, if 
t = sup(ran(s)), then sup(F[ran(s)]) exists and F(t) = sup(F[ran(s)]). More 
suggestively, we write this as Fdim, x,) = lim, F(x,), where x, = s(n) 
forn Ea. 


q@se also the discussion in VI.3.5. 


“Continuity” here is a concept akin to continuity of functions of a real vari- 
able, if we interpret sup as a limit in the sense of calculus: Here sup commutes 
with the function letter, F(sup($)) = sup(F[S]), exactly as lim,_., does in the 
case of limits in calculus. 

Note that continuity implies quite a bit about the right field of F — In particular, 
the existence of suprema under certain conditions (that the sets considered are 
images under F of sets in A that themselves have suprema). On the other hand, if 
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we independently know that, say, every nonempty subset of B has a supremum 
(for example, a complete lattice does), then all that continuity of F adds is that 
the objects F(t) and sup(F[S]) are equal. © 


We define three additional concepts when F is a transfinite sequence. 


VI.5.29 Definition. A transfinite sequence f with ran(f) C On is 


(1) weakly continuous iff, for each a € dom(/), if Lim(@) then f(a) = 
sup{ f(B):B < a}, 


(2) normal iff it is increasing and weakly continuous, 
(3) weakly normal iff it is non-decreasing and weakly continuous. 


VI.5.30 Proposition. A continuous function F, as in VI.5.28, is non- 
decreasing. 


Proof. Let x <, yin A. Now, sup{x, y} = y, hence sup{F(x), F(y)} exists and 


F(y) = sup{F(x), F(y)} 


(recall, F is total), i.e., F(x) <2 FQ). 


VI.5.31 Corollary. A countably continuous function F, as in V1.5.28, is non- 
decreasing. 


Proof. Let x <,; yin A. Then 


{; ifn =0 
s=An. : 
y ifn>0 


is an ascending sequence and ran(s) = {x, y}. 


VI.5.32 Proposition. A continuous transfinite sequence f with ran(f ) C On is 
weakly continuous. 


Proof. Let Lim(@) and a € dom(f). Now, a = sup{B : 8B <a}; hence, by con- 
tinuity, f(a) = sup{f(B): B <a}. 


VI.5.33 Corollary. A continuous transfinite sequence f with ran(f ) C On is 
weakly normal. 
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VI.5.34 Corollary. An increasing continuous transfinite sequence f with val- 
ues in On is normal. 


The following establishes the converse of VI.5.33. It will prove useful in 
Section VI.10. 


VI.5.35 Proposition. [f a transfinite sequence f is weakly normal, then it is 
continuous. 


Proof. Let ®@ A S C On be a set. Let a = sup S (it exists by VI.5.22). 
Ifa € S, then f(a) € f[S] is maximum, since f is non-decreasing. 


Leta ¢ S. Then Lim(q@) by Exercise VI.14. Now 
sup fIS]= Ji f(r): € 3} S Uf): 7 € a} (1) 
since y € S — y € a. On the other hand, the choice of a yields 
(Vy €a)Gd € S)y <6 
Hence 
(Wy € ad € S) f(y) < FO) 


and the C in (1) is promoted to equality. But the right hand side of C is f(a) 
by weak continuity. 


VI.5.36 Corollary. [f f is normal, then it is continuous. 


VI.5.37 Example. A transfinite sequence f can be weakly continuous without 
being continuous. This is because weak continuity can be satisfied by a function 
which is not non-decreasing. For example, let f:w+1— w+ 1 be given by 


2k ifx =2k+1Akeqa 
f@)=42k4+1 ifx =2kAkK Ew 
w ifx =o 


Thus, f(2k) = 2k + 1 and f(2k + 1) = 2k for all k € w, so f is not non- 
decreasing. On the other hand, f(w) = w = sup{n:n < w} = sup{f(m):n < 
w}, and w is the only limit ordinal in dom(f ). © 


So continuity of an f:On — On is equivalent to a (weak) monotonicity 
property, together with weak continuity. On the other hand, by the above exam- 
ple, weak continuity alone does not imply any monotonicity property for the 
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function. It turns out that with a bit of a boost, weak continuity can imply a 
strong monotonicity property for the function, and hence continuity by VI.5.35. 


VI.5.38 Proposition. [f f is a weakly continuous On-sequence of ordinals 
that moreover satisfies (Va) f(a) < f(a + 1), then f is increasing, and hence 
normal. 


Proof. We need to show f(a) < f(B) for all a < £. We do induction on £. 
Basis. B = 0. The contention is vacuously satisfied. 


The successor case. Say B = y + 1, andleta < 6. Thus,a = y ora < y. 
In the former case, f(a) < f(B) by the assumption; in the latter case, by I.H., 
the assumption, and transitivity of <. 


The limit ordinal case. Say Lim(f). By weak continuity, 
f(B) = supt f(y): y < B} (1) 
Let now a < 6, hencea+ 1 < 8, so that 


f(a) < f(~+1) by assumption 
< f(B) by (1) 


Note that the I.H. was not needed in this case. 


VI.5.39 Proposition. If f is a weakly continuous On-sequence of ordinals that 
moreover satisfies (Va) f(a) < f(a+ 1), then f is non-decreasing, and hence 
weakly normal. 


Proof. Exercise VI.16. 


The following easy result will be useful in Section VI.10. It says that a 
normal On-sequence of ordinals maps limit ordinals to limit ordinals. 


VI.5.40 Proposition. [f f is a normal On-sequence of ordinals and Lim(q), 
then also Lim(f (@)). 


Proof. Suppose that Lim(@). Then f(a) = sup{f(y):y < a}. But f(a) ¢ 
{f(v):y < a} because f is increasing. Thus Lim( f(@)) by Exercise VI. 14. 


VI.5.41 Definition (Fixed Points). A fixed point (also called fixpoint some- 
times) of a function F: A > Ais au € A such that Fu) = u. 
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VI.5.42 Theorem (Knaster-Tarski). Let (A, <) be a PO class with a minimum 
element t, and where every ascending sequence has a supremum (meaning that 
its range does). If F: A — A is countably continuous, then F has a fixed point. 


Proof. Define by recursion over w the sequence (Sy)n<« by 
So=t 
Snt1 = F(S,) 


By induction on 7 one sees that the sequence is ascending. Indeed, t = sp < sy 
(t is minimum), and if (I.H.) s, < 5,41, then, by VI.5.31, 5,4; = F(s,) < 
F(Sn41) = S$n42- 


Let u = sup{s, :n < w}. By countable continuity, 


F(u) = sup{F(s,):n < a} 


= sup{Spziin < a} (1) 
= sup{s,:n < @} (2) 
=u 


where the passage from (1) to (2) is justified by sp < s, for alln € w. 


VI.5.43 Corollary. The fixed point u of the previous theorem is <-least. That 
is, any c such that F(c) < c satisfies u < c. 


en particular, any c such that F(c) = c satisfies u < c. 


Proof. Let F(c) < c. Now, by induction on n we see that s, <c for alln < o, 


because so = t < c, and if (1.H.) s, <c, then s,4,; = F(s,) < F(c) < c. 


Thus uv = sup{s,:n < @}<c. 


VI.5.44 Corollary. [f f is a weakly normal transfinite On-sequence, then it 
has a fixed point B. 


Proof. The proof follows that of Theorem VI.5.42. We must just ensure that 
the key assumptions hold. Well, On has a minimum element, and if (5,)y <a 
is ascending, then sup{s, :1 < w} exists by VI.5.22. The rest is taken care of 
by VI.5.35. 


VI.5.45 Corollary. If f is a normal transfinite On-sequence, then, for every 
y, it has a fixed point B such that y < B. 
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Proof. All else is as above, but now define so = y + 1. You will need to argue 
that (5; ),<« is non-decreasing via a different route than before (Exercise VI.18). 


<@ The above says that normal On-sequences of ordinals have arbitrarily large 
fixed points. 


It is easy to see that the proof (imitating that of the theorem) yields the least 
fixed point greater than y (Exercise VI.18). 


Theorem VI.5.42 can be sharpened in one direction, that is, dropping the 
requirement of (countable) continuity. A small trade-off towards achieving this 
is to restrict attention to PO sets (A, <) where every subset of A — not just 
ascending sequences — has a supremum in A. 


VI.5.46 Definition. Let (A, <) be a PO set. A total function f:A — A is 
inclusive or expansive iff, for allx € A,x < f(x). 


The terminology depends on which side of < one is looking at. The input is 
“expanded” by, or “included’” in, the output. 


VI.5.47 Theorem. Let (A, <) be a PO set such that every S C A has a least 
upper bound in A. If f : A — A is either inclusive or monotone, then it has a 
fixpoint c € A, that is, f(c) = c. 


Proof. Let t = sup @, the minimum element of A. We define, by recursion over 
On, the transfinite sequence s:On — A: 


Su = f (sup{sg:B < a}) (1) 
Q@ +> Sq is total on On. For convenience we set 
S<q = sup{sg: B <a} (2) 
This simplifies (1): 
Sq = f(S<a) (3) 
We now claim that aw +> sy is monotone. 


1 Ttis no more weird to pronounce a < b— where < is an arbitrary order — “a is included in b” than 
to pronounce it “a is less than or equal to b”. Each version has an obvious “concrete” motivation. 
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Case I. f is inclusive. Then scg < Sy by (3); hence sg < sy whenever 
B <a (by (2)). 

Case 2. f is monotone. By (2), B < a implies scg < S<q. Hence, by 
monotonicity, f(s<g) < f(S<q). That is, sg < Sq. 


By collection, Aaw.s, cannot be 1-1 (A is a set). Let us fix attention on some 
f and y such that 6 < y and sg = s,. By monotonicity of Aas, if B <a <y 
then sg = Ss; thus 


Scy = sup{Syg:a < y} = Sg (4) 
We can now calculate as follows: 


Sg = Sy 
f(S<y) 
= f(sg), — by ) 


Thus, c = sg works. 


VI.5.48 Remark. (1) The reader can easily verify that we can weaken some- 
what the assumption on (A, <) and still prove our theorem. It suffices to pos- 
tulate that the PO set has a supremum for every chain.’ Thus, (1) would be 
undefined unless {sg : 8 < a} is a chain. Well, one has to prove that it will be a 
chain (under the changed assumptions for (A, <)) anyway (Exercise VI.19). 


(2) It turns out that with the 6 and y as fixed in the proof above, 
Y Sa Sy = Sq (5) 


On can prove (5) by induction, since the class of ordinals above y is well- 
ordered. Thus assume the claim for all @ such that y < a < 6. Now, mono- 
tonicity of A@.s, and the I.H. entail 


Scg = sup{sg:0 < d} =, (6) 


Applying f to the two extreme sides of (6) and remembering that s,, is a fixpoint 
of f, we obtain ss = s,. In particular, for the y that we fixed in the proof, we 
have shown that s, = scy. 


Pause. Must it also be the case, for the 8 of the proof, that s.g = sg? 


VI.5.49 Corollary. Restricting the assumptions of V1.5.47 to a monotone 
f:A-— A, there is a least fixpoint c for f. That is, f(c) = c, and if f(d) < d, 
thenc < d. 


+ That is, every set S A that is totally ordered by <. 
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Proof. The c of the proof of VI.5.47 works. It suffices to prove that sy < d for 
alla. 

For a = 0, 59 = f(sup @). Now, sup @ < d; hence sp = f(sup 0) < f(d) <d, 
using monotonicity of f. In general, seg = sup{ss:6 < a} < d by I.H. By 
monotonicity, sy = f(s<a) < f(d) < d. 


We conclude this section with the important Zermelo well-ordering principle 
and another theorem that is equivalent to AC. The connection here is that the 
proofs involve recursively defined transfinite sequences. 


VI.5.50 Theorem (Zermelo’s Well-Ordering Principle). Every set can be 
well-ordered. 


Proof. Let x be any set. If x = @, then the result is trivial. Assume then that 
x #@and let f be a choice function (AC) on P(x) — {9}, ie., 


Wy(O#yCx—> fiyey) (1) 
By recursion over On (with @ as the recursion variable) define 
(Va)h(a) ~ f (x — ran(h fa) (2) 


It follows by VI.5.16 ((2) is a pure recursion) that dom(h) = On, or dom(h) = y 
for some y. Now, h/ is 1-1, for if @ < 6 and B € dom(h) — so that a € dom(h) 
as well — then, by (1) and (2), A(B) € x — ran(h[ B), and hence h(B) 4 h(a). 
Thus, by collection — since ran(i) C x — we have dom(h) = y. 


h is onto x, for dom(h) = y entails that 
y = min{éd:6 ¢ dom(h)} 
= min {6 : f (x —ran(h r5))t} 
= min{d:x — ran(h[ 5) = 9} 


That is, x = ran(h[ y) = ran(h). 


By VL3.12, h induces a well-ordering <; on x such that ||(x, <)|| = y. 


VI.5.51 Remark. The above theorem is due to Zermelo (1904, 1908). The proof 
in his 1904 paper is reproduced in Kamke (1950) (see also Exercise IV.3). It 
is noteworthy that while AC was, of course, employed in an essential way in 
the original proof, it was taken there to be a “fundamental truth” of set theory 
rather than an additional assumption (axiom). 


¥ See Kamke (1950, p. 112), especially the concluding remarks prior to the statement of the well- 
ordering theorem. 
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Cantor had conjectured (but not proved) a special case of the well-ordering 
theorem in 1883, where x is the set of reals, R. 


We have remarked several times that AC holds on any WO set. Thus, 


VI.5.52 Corollary. AC is equivalent to the well-ordering principle (in the pre- 
sence of the remaining axioms). 


Proof. If F (a set) is a family of nonempty sets, then let <, well-order J F. To 
each x € F associate its <;-minimum element xX jn. The function x +> Xin iS 
a choice function on F’. 


The careful reader will observe that “remaining axioms” need not include 
foundation or power set, as the ordinals can be developed without these 


axioms. © 


VI.5.53 Corollary. AC is equivalent to the following: For every set x there is 
an ordinal a and a function g with dom(g) = a@ and x C ran(g). 


Proof. Assume AC. Let x 4 @. Referring to the proof of VI.5.50, we can take 
g=handa=y.Ifx =9, thena = 1 and g = {(0, 0)} work. 


Conversely, let F be a nonempty family of nonempty sets. We take x = (J F 
and let w and g be as described in the corollary. 


A choice function for F is ay.g( min(g~'Ly)) (Fx. 
The following is important for the next chapter. 


VI.5.54 Corollary. For every set x there is an ordinal a and a 1-1 correspon- 
dence between x and a. 


VI.5.55 Theorem (Kuratowski-Zorn Theorem). /f (A, <) is a PO set where 
every chain has an upper bound, then for every a € A there is a <-maximal 
element c € A such that a < c. 


@ The above is usually referred to as ““Zorn’s lemma ”. © 
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Proof. Let f be a choice function on P(A) — {0}. We define by recursion a 
transfinite sequence A@.ty: 


tp =a 
tet1 X i (Getty < x}) (1) 
tip ({x:x is an upper bound of {tg: B < a}}) if Lim(@) 


(1) is a pure recursion; thus, dom(Aa.t,) = On or dom(Aa.t,) = @ for some 
8 € On. We will determine which one is the case.' Let us simply write “rt” for 
the function A@.t,. We prove that f is increasing on dom(f), that is, 


a < B €dom(t) > ty < tg (2) 
We do induction on # (the proof is entirely analogous to that of VI.5.38). 


For the basis, the implication (2) is trivially provable if 8 = 0. Suppose now 
that B = y + 1. There are two cases: 

One is where a = y, and we are done by the second case in (1). 

The other is where a < y. The IH. yields t, < t,. But we also have t, < t,4+1 
by (1). We are done by transitivity of <. 

Finally, let Lim(8). As remarked in the preceding footnote, a < 6 implies 
a € dom(f). Since tg |, the third case in (1) yields that ty < tg for anya < B. 
For such an a, a + 1 < £ as well; thus a + 1 € dom(f) and t,4; < tg. But 
ty < ty+, by the second case in (1). 

We must thus choose the case dom(t) = 6 by collection. We claim that 6 is 
a successor (it is not 0, since fy = a). If not, Lim(@). But then, {tg :a@ < O}isa 
chain because f is increasing, thus f is defined at 6 using the third case of (1), 
contradicting the fact that dom(t) = 0. 

Let then 6 =n +1. Then t, | anda < t, (why “<” and not “<”?). More- 
over, tf, is <-maximal; otherwise the second case in (1) would define ¢,+1. 


VI.5.56 Corollary (Hausdorff’s Theorem). Jn a PO set (A, <) every chain 
B C A is included in some maximal chain C. That is, C is a chain, B © C, 
and there is no chain D such that C C D. 


Proof. Let us order the set of <-chains of A by inclusion. So we have <-chains 
and C-chains. 


+ Note that (1) may yield undefined right hand sides in the second or third case of the definition, 
because the set we are using as the argument of f is actually not in the domain of f (because it 
is 6). For example, this may happen in the third case if {tg : B < a} is not a chain. 

¥ Since dom(t) is On or an ordinal, it is transitive. Therefore, « € 6 € dom(t) implies a € dom(r). 
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Now, the union of the members of a C-chain — these are <-chains — is a 
<-chain. Indeed, let S be a C-chain, and let x € (JS and y € US. Then 
x €BeSandye€ B’ € S. Without loss of generality let B C B’. Then x and 
y are in B’ and thus are <-comparable. In short, ) S is a <-chain. It is trivially 
a C-upper bound of the members of S$ (<-chains). 

It follows by VI.5.55 that for any <-chain B there is a C-maximal <-chain 
C such that BCC. 


VI.5.57 Corollary. Hausdorff’s theorem is equivalent to Zorn’s lemma, and 
thus to AC. 


Proof. In view of the proof of VI.5.56, we need only prove that the latter 
implies Zorn’s lemma. Let then (A, <) be a PO set where every chain has an 
upper bound. Let a € A. Now, {a} is a chain; thus there is a maximal chain 
C C A such that a € C. Let c be an upper bound of C. Then a < c trivially. 
Moreover, c is maximal, for if not, there is ab > c. But then C U {b} is a chain 
that properly extends C. 


We have learnt a few of the properties of the transfinite sequence of ordinals 
in this section, some of which will be conveniently used in the sequel, especially 
in Section VI.10. There is a bit more that we will have to say in the latter section. 
For the whole story, the advanced and curious reader should consult Levy (1979) 
and the older references Bachmann (1955), Sierpiriski (1965). 


VI.6. The von Neumann Universe 


We now turn to formalizing “stages” of set construction within set theory, and to 
studying what such formalization entails. We will define a transfinite sequence 
of sets — built from an arbitrarily chosen set of urelements NV — which when taken 
together constitute a “universe” of sets and atoms of set theory built from N, in 
the sense that all axioms of set theory hold in this universe. This construction is 
effected within ZFC by recursion on ordinals, and for every a formally yields a 
set, Vy(a), that consists of all sets (and atoms) that we can define at “stage” a. 
The (proper) class Uy = U Vy (a) is all we can construct from the atoms 
N, using the axioms. 

If we mimic the formal construction outside ZFC, using “real” mathematical 
objects for atoms and working within “real” mathematics, we may then Platon- 
istically proclaim that we have, at long last, constructed the “natural” model of 
set theory, a model that we have only vaguely described in II.1.3.7 


aeOn 


+ But from which vague description we have firmly justified the selection of axioms of ZFC! 
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The reader is cautioned that the formal construction within ZFC does not 
provide a formal proof of consistency of the axioms. What it does is build a for- 
mal interpretation of Ls, and ZFC over the language Ls and ZFC. Thus, 
formally, it only proves that (cf. 1.7.10) if ZFC is consistent, then ZFC is 
consistent. Hardly newsworthy. Nevertheless, as we have noticed above, 
Platonistically we get much more out of this construction. 


VI.6.1 Definition (The Cumulative Hierarchy or von Neumann Universe). 
Recall that we have been using the (rather unimaginative) name M formally, 
having introduced it by the formal definition 


M =y @-7U(y) A (Wx)(x € y > U()) 


or simply put, M = {x : U(x)}. For any N C M we define Vy(q@) by induction 
over On by 
Vy (0) = N 
Vy(a + 1) = P(N U Vn) 
Vwi =, Yw(@) if Lim(a) 
If N = @, then we omit the subscript “N” in Vy(q@) and just write V(q). 
We also define the sequence Ry(a) by 
Ry(0) = 9 
Ry(a+ 1) = PIV U Ry@)) 
Ry(a) = ke) ut Ry(a) _ if Lim(a) 


If N = @, then we omit the subscript “N” in Ry(q) and just write R(q). 


We denote () Ry(q@) the class of well-founded sets built from N, by 
WE y. 


aeOn 


eine Remark. We view the above as a recursive definition of the functions 

Aa.Vvy(a) and Aw.Ry(q@), effected for any set N of urelements. We can also 

view it as defining AaN.Vy(a) and AaN.Ry (a) respectively, i.e., where N is 

a parameter. We then explicitly arrange that the right hand side in each case is 
equal to a “don’t care value” if N C M fails. We can use @ for this value. 

Alternatively, we may define AwA.V,4(q@) for any set A (parameter) but mod- 

ify the definition setting V,(0) = TC(A) and Va(a+ 1) = P(TC(A)U Vy(@)). 


One can easily prove by induction on @ that sp(Vy(a@)) C N. 


+ We actually end up deriving a somewhat less trivial result. 
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In what follows we want to (formally) settle two claims: 


One, that L,<on 
axioms of set theory. Or, Platonistically, that this union is a “universe” of sets 
and atoms (built from NV). 

Two, that Uo con 
of set theory, if these objects are built from the initial set of atoms N, Le., 
that Lal epe Vu (a) = {x : sp(x) C N}. Even though this will be done formally, 
we want to point out its Platonist interpretation: If our recursive definition is 


Vy (a), for any initial set of urelements JN, satisfies all 


Vy (a) contains all the objects (i.e., sets and urelements) 


carried out in the realm of real mathematics, and if N happens to contain all the 
atoms, then (_) 
objects, Uy. 


wen VN () contains everything: it is the class of all mathematical 


The above inductive definition reflects the intuitive idea of the formation 
of sets by stages. At stage 0 we collect a collection of atoms, which are given 
outright, into a set via the equation Vy(0) = N. Subsequently, urelements are 
used along with sets already built, in the second equation above, to build new 
sets at a “powering stage” a + | (identified by a successor ordinal). 

We also note that at a “collecting stage” a, identified by a limit ordinal, 
we collect together all objects previously constructed or “donated” that are 
scattered around in the various Vy(6) for 8 < a. Thus, ordinals serve as 
(formal) “stages” of set construction. 

Comparing this formal definition with that in Section IV.2, we note that we 
deviate from the latter in that we include all subsets of N U Vy(q) at a powering 
stage, not only the “definable” (or “constructible’”’) ones. 


VI.6.3 Lemma (Vy vs. Ry). For successor ordinals a, Vy(a) = Ry(a). For all 
other ordinals, Vy(a) = N U Ry(q@). 


Proof. A trivial induction: For a = 0, Vy(0) = N = N U Ry (0). For Lim(@), 


Vy (oe) = (J Vw(B) 


B<a 


=NU tu) Ry(B) since Vy(B) € {Ry(B), N U Ry(B)} by LH. 
B<a 


= NU Rn(@) 
Fora = 6+1, 


Vu (B + 1) = PIN U Vn(B)) 
=P(NURy(B)) by LH. 


= Ry(6 +1) 


© 
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Thus, N U WFy = ree Vy (a). In WF y we collect only the sets built from 
N, and leave “loose” urelements out of the collection. As in III.4.20, Vy will 
denote the class of all sets built from N. The question whether WFy = Vy is 
a subsidiary of the second claim made in VI.6.2, and will be settled shortly. 


VI.6.4 Proposition. N U Vy (q@) is transitive for all a. 


Proof. By induction on a: N U Vy(0) = N is transitive, for x € y € N is 
refutable. 

Consider N U Vy(q@ + 1), on the I.H. that N U Vy (q@) is transitive. Let then 
xeyeNUVy(a + 1); hence (why?) x € y € Vy(a + 1); therefore x € y C 
N UVy(q@), so that x € NU Vy(q@). If x € N, thenx € NU Vy(a + 1) and we 
are done, otherwise, x C N U Vy(q@) by IH. and hence x € Vy(a@ + 1). 

Let finally Lim(q@), and (I.H.) assume that all N U Vy (8) are transitive for 
B <a. Suppose that x € ye NUVy(a) = NU Usew Vy (B). Thus, 


xeye LJ Wis) 


B<a 


so that x € y € Vy(B) for some 6 < a. ByI.H.,x € NUVn(B) C NU Vn(q@). 


It follows that L), con Vv(@) = N U WE is transitive as well. Note however 
that WF y is not transitive (unless N = 9), since for any urelement p, p € N € 
Ry(1) C WEy, yet p ¢ WFw. 


VI.6.5 Corollary. If there are no urelements (i.e, N = @), then Ry(a) = 
Vy (a) is transitive for all a. 


VI.6.6 Corollary. (Va)(VB < a)Vy(B) C NU Vy(@). 


Proof. We do induction on a. For ~ = 0 the statement is vacuously satisfied. 


The case for a + 1, on the I.H. that the claim holds for a: Let us first 
consider B = a < a+ 1. Take x € Vy(q@). If x € N, then we are done; else 
x C NU Vy(q@) by VI.6.4, whence x € Vy(a + 1). Thus 


Vy(a) S NU Vy(@ + I) () 


Let next B < a. The LH. yields Vy(6) C N U Vy(a@); hence Vy(B) C NU 
Vu(a@ + 1) by (1). 


© 
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The case Lim(a): Here already Vy(6) C Vy(@), even without the help 
of IH. 


VI.6.7 Corollary. (Va)(VB < a@)Vn(B) € Vy (a). 


Proof. Induction on aw. For a = 0 the claim is vacuously satisfied. 
Fora + 1, Vvy(a + 1) = P(N U Vy(@)); hence 
Vy(a) € Vy(a + 1) (1) 
If, in general, 6 < a + 1, then it remains to consider B < a. By I.H., Vy(B) € 
Vy (a). By (1) and VI.6.4, Vy(B) € Vy(a@ + 1). 


The case Lim(a): Let 8B < a, hence 68 +1 < a. Now Vay(P4+ 1) C 
hse Vu (vy) = Vy (q@); thus, by (1), Vy(6) € Va(e) (the I.H. was not needed 
here). 


VI.6.8 Corollary. (Va)a C Vy(@). 


Proof. Induction on a. For a = 0 the claim is trivial. 

Fora + 1, Vy(a + 1) = P(N U Va(a@)) 3 a, by LH. By VI6.4, a C 
N UVn(a + 1); hence a C Vy(a + 1), since ordinals are pure sets (VI.5.21). 
Allin all,a+1=a@U {a} C Vy(a+ 1). 

The case Lim(a): B C Vy(B) C Vy(q@) for all 6B < a, by I.H. Thus, a = 
Ua =U{B: 6 < a} C Vy(q), where the first = is justified in Exercise VI.13. 


VI.6.9 Corollary. 6 € Vy (a) iff B < a. 


Proof. If part. B € a C Vy(a) (by VI.6.8) implies 6 € Vy(q@). 


Only-if part. Induction on a. For a = 0 the claim is vacuously satisfied, 
because 6 € N is refutable. 


For a + 1: Let B € Vy(@ + 1). Thus 


BC Vy(@) (1) 
since ordinals are pure sets. Why should 6 < a+ 1, thatis, B <a,orB C a? 
Well, let y € 6. Thus y € Vy(q@) by (1). By LH., y € a. 


The case Lim(a): Let B € Vy(a) = U{Vn(y): y < a}. So B € Vy(y) for 
some y < a; hence, by I.H., 6 < y. Thus, B <a. 
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VI.6.10 Corollary. On C U,,<o, Vi (@). 


VI.6.11 Remark. By VI.6.6 we have a hierarchy of sets Vy(a), in the sense 
that as the stages, a, of the construction progress, we obtain more and more 
inclusive sets. 


The hierarchy is proper, that is, Vy(B) C NU Vy(q@) if B < a, ie., the 
construction keeps adding new stuff. This is because of B € Vy(a) — Vn(B) 
(by VI.6.9). Alternatively, if Vy(B) = N U Vny(@) for some B < a, then, 
by VI.6.7, Vy(B) € Vn(B). 


At the end of all this, have we got enough sets to “do set theory’? In other 
words, are all the axioms of set theory true in the “real” 


U Vy (a) (A) 


aeOn 


or, formally, are the axioms provable when relativized to ()yco, Vv(@) 
(cf. Section I.7)? And are these all the sets we can get if we start with a set N 
of atoms? That is, is it the case that 


Uv = LU Wwe) (B) 
aeOn 


Let us address these two questions in order. First a lemma. 
VI.6.12 Lemma. For any set x, x C NU WEy implies x € NU WEy. 


Proof. If x = @, then x € Vy(1). So assume that x 4 ¥, and let a, denote, for 
each y € x, the smallest w such that y € Vy(q@). By collection, S = {a,:y € x} 
is a set; hence, sup S exists (by VI.5.22). Say it is 6. But then (by VI.6.6), 
x C NU Vjy(B); hence x € Vy(B + 1). 


VI.6.13 Theorem. For any initial set of urelements N, the class NU WF y = 
U weOn VN(Q) satisfies all the axioms of set theory, that is, 


J = (Lset, ZFC, NU WE) 
is a formal model of ZFC. 
We are somewhat simplifying the notation in our applications of the mate- 


rial of Section I.7. Thus, we have omitted the last component, “7”, in “J = 
(Lget, ZFC, NU WF ),)”. If we ever need to write “p7” (for some predicate P), 
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«« PNUWF y ” 


we will write instead. Two more simplifications in our notation are: 


(1) We wrote Lset, but we mean here the basic language augmented by the 
various defined symbols we have introduced to this point. 

(2) We wrote N U WFy rather than “./(x)’, where the latter is the defining 
formula of the class term. 


Proof. The reader may wish to review the concepts in Section I.7. 


Now, the requirement that 
ALG (Ax)(x Ee NU WF vy) 


is trivially met by zpc 0 € NU WE (cf. VI.6.10) and the substitution axiom. 


We interpret now our two nonlogical symbols, U and €. We interpret both 
as “themselves”. That is, eNYWFy = € and UNUWFy = VU, 


A moment’s reflection (cf. I.7.4) shows that U(x) is “true in N U WF jy” iff 
x € N. Indeed, = 3 U(x) is short for 


-zpc x € NUWFEy > U(x) 


Thus x € WFy is untenable. Therefore x € N. The other direction is trivial. 
Correspondingly, “A is a set” translates to “A is a set and A € N U WFy..” 


Next, to facilitate the argument that follows, we look at the (defined) symbol 
“C”: Suppose that A, B arein NUWF y. What will the interpretation of A C B, 
that is, of 


(Vx)(x €A—>x eB) (1) 

be? It will be 
(Vx e NUWEy)(x € A> x € B) (2) 
Trivially, (1) implies (2). Interestingly, (2) implies (1): Indeed, to prove (1) 


(from (2)), let x € A. Since also A ¢ N U WEy, we get x © NU WEy by 
transitivity of N U WEy. Then x € B by (2). 


Thus, Platonistically, C has the same meaning in N U WFy as in the whole 
universe Uy. 


1 It is easier expositionally to refer to N U WF y, meaning really 3. The jargon “true in” was 
introduced (with apologies) on p. 80. 


© 
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We now turn to the verification of all the axioms in N U WF y. 
(i) The axiom (Ax)(Vy)(U(y) = y € x) translates to 
(ax e NUWEy)(Vy € NUWFy)(U(Q)) © y € x) 


Since N € N U WE is provable (e.g., 1.6.7), to prove the above it suffi- 
ces to argue the case for 


(Vy € NU WEy)(U(y) > y € N) 


But this we have already done. 
(ii) That the axiom U(x) > (Vy)y ¢ x is “true in N U WF” means 


Fzpc x € NU WEy > U(x) > Vy € NU WEN)y € x 


which is a tautological consequence of U(x) > (Vy e NUWEw)y ¢ x, 
and the latter trivially follows in ZFC from U(x) — y €e NU WEy — 
y ¢ x, itself a tautological consequence of U(x) > y ¢ x.! 

(iii) Axiom of extensionality. It says that for any sets A and B, 


ACBABCA>A=B 


Now, for any sets A, B in NU WF, the relativization of this is provable 
in ZFC, since “=” is logical, and we saw above (the equivalence of 
(1) and (2)) that “C” relativizes over N U WF y as itself. 

(iv) Axiom of separation. It says that for any set B and class A, A C B implies 
that A is a set. To see why this is true in NUWF),,‘ let B € NUWF)y and 
A C B. Thus, in ZFC, A is a set (recall the invariance of meaning of “C”’). 
To prove that (A is a set)’UWFy , equipped with our preliminary remarks 
at the onset of the proof, we only need to show that A € NU WF y. Now, 
B € Vy(qa) for some a; hence (VI.6.4), A C B C NU Vy(q@). Therefore, 
A &€ Vy(a + 1); hence A € N U WFy. 

(v) Axiom of pairing. For any a,b in N U WFy one must find a set C € 
N U WEy such that a € C and b € C. We can take C = {a,b}, 
since a € Vy(@) and b € Vy(f) implies (using VI.6.6) that {a, b} € 
Vn (max(a, 8) + 1). 

(vi) Axiom of union. For any set A €¢ N U WFy we need to show that there 
is aset B € N U WE such that, for all sets x in N U WEy,! 


xEArxcB 


TL Wy)(4> 2) o (4 (Vy)%), provided y is not free in 4. 

! Armed with 1.7.4, the reader will not be confused by the frequent occurrence of the argot “is true 
in N U WFy’” in this proof. 

8 Note that we have translated C by itself, due to (1) and (2). 
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We can take B = N U Vy(q@), where A € Vy(q@). Indeed, x € A implies 
x € NUVy(q@) by VI.6.4; hence, x being a set, x C NU Vy(a@) by VI.6.4 
again. Of course, B € N U WEy. 

(vii) Power set axiom. We need to show that for any set A € N U WF y there 
is a set B € NU WEy such that, for all x € N U WEy, 


xCA>XEB. 


We can take B = Vy(a + 1), where A € Vy(q@), since A C N U Vn(@) 
by VI.6.4. 
(viii) Collection. We need to show the truth of 


(vx € A)(Ay)F [x, y] > (Az)(Vx € A)(Ay € z).F[x, y] 
in N U WFy, that is, for any A and formula. , 
AE NUWFy and (Vx € A)(y € NU WEy).z [x,y] (3) 
imply within ZFC that there is a set! z in N U WEF such that 
(Wx € A)(ay € 2)F Ix, y] (4) 


So assume (3). This we view as (Vx € A)(Ay).¥[x, y], where “[x, y] is 
“y € NU WEy A.¥[x, y]”. We can now apply collection (in ZFC) to 
obtain a set B (in ZFC) such that 


(vx € A)(y € B)Y{[x, y] 


or 


(vx € A)Gy € B)(y € NU WEy /.7 [x, y]) (5) 


By (5) and Lemma VI.6.12, z = BM(N U WEy) is what we want 
for (4). 

(ix) Axiom of infinity. We need an inductive set in N U WFy. Since w € 
N UWEyN by VI.6.10, we are done. 

(x) AC. Let S be a set of nonempty sets in N U WFy. By AC (in ZFC), 
there is a choice function, f:S — (JS, such that f(x) € x for all 
x € S. We want to show that f is in N U WEy. Now by (iv) and (vi), 
US € NUWEy; hence S x L) S € NU WFy using (iv), (vi), and (vii), 
since S x US C P(SU P(S U LUS)). Thus, f © S x US implies 
feEeNUWEy. 

(xi) Axiom of foundation. We left this till the very end for a special reason. 
We will show that N U WF, would satisfy foundation even if we did 


T Cf. 11.8.4. 


VI.6. The von Neumann Universe 367 


not include foundation in ZFC. Let then 6 4 A C NU WEy. Take 
a = minfB:AN Vy(B) ¥ 9}, and pick an A € AN Vy(q@) (auxiliary 
constant). If A is an urelement, then 


(ax € A)Vy € A)y Ex 
is provable, since 


(Wye A)\y¢A (6) 


is. If A is a set, then (6) is still provable (in ZFC without foundation): 
First, A ¢ Vy(y) if y < a. Thus, a = 6+ 1 for some 6 (why?); 
hence A C N U Vy(6). Then, y € A implies y € N U Vy(6), so that 
y € Vy(max(0, 5)); hence y ¢ A (else w < max(0, 5)). 


Part (xi) in the proof above was carried out without foundation. Indeed, the 
whole theorem can be proved without foundation, in “ZFC — f” — where “f” 
stands for foundation. This is due to the feasibility of basing everything in 
the proof on the Kuratowski pairing, (x, y) = {{x}, {x, y}}, while defining 
ordinals as in VI.4.25. Indeed, everything we have said up until now (except for 
the examples regarding the properties of the collapsing function) can be said 
without the benefit of foundation. 

Thus, the whole construction has built more than we were willing to admit 
initially. We have built a formal model of ZFC in 3’ = (Lge, ZFC — f, NUWF y) 
rather than in “just” J = (Lse, ZFC, N U WF y). 1.7.10 yields at once: 


VI.6.14 Corollary. /f ZFC without foundation is consistent, then so is ZFC. 


But we do have foundation — that is, its suspension was only temporary, to 
obtain VI.6.14. 


VI.6.15 Theorem. Uy = U,con Viv(@). 


Proof. By Uy we understand, of course, {x: sp(x) C N} (cf. VI.6.2). The 
>-part is trivial, so we address the C-part. Let us use €-induction over Uy, 
so let x € Uy, and assume (I.H.) that for all y € x, y € NU WEy. Thus, 
x CNUWEy. By VI.6.12,x © NU WEy. 


VI.6.16 Corollary. WFy = Vy. 


¥ Once again we point out that there is no circularity in this assertion, for ordinals can be defined 
without the presence of foundation (see the discussion following VI.4.25). Another revision that 
one needs to make in this (temporary!) rewriting of our development is the definition of (x, y). 
To avoid foundation one defines (x, y) = {{x}, {x, y}}. 


© 


© 
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VI.6.17 Corollary. Foundation is equivalent to the statement Vy = WF y. 
ese also the discussion in II.1.4. © 

Proof. That foundation implies Vy = WFy was the content of the proof of 


VI.6.15. Conversely, if Vy = WFy holds, then foundation holds, since it is a 
theorem of ZFC — f in WF y. 


mnie way to put all this is that if we drop the foundation axiom, then 
WF y Cc Vn 
The sets in Vy — WFy are the hypersets (see Barwise and Moss (1991)). © 


We started off in Chapter II by proposing — after Russell — that sets be formed 
in stages. The concept of “stage” was necessarily vague there, yet it assisted 
us to choose a small group of “reasonable axioms” on which we based all our 
deductions hence. We have now come to a point that “stages” can be formalized 
within the theory! 


VI.6.18 Definition. We say that a set x is formed from N at stage‘ a iff x € 
Ry(a@) (iff x € Vy (a), since x is a set). 


Thus Ry(q@) is the set of all sets formed at stage a. 


Principle 0 of Chapter II says that “an arbitrary class is a set formed (from NV) 
at some stage if all its members are formed (set-members) or given (urelement- 
members) at some earlier stage”, and Principle 1 says that “every set is con- 
structed at some stage’. All this has now become formally true (with our final 
interpretation of what “stage” means). 

Indeed, if the set x is in Vy (@) and @ is smallest (“earliest”), thena = B+ 1 
(why?); hence x C NU Vy(B) = NU Ry(B). That is, all the elements of x are 
formed (€ Ry(f)) or given (€ N) at some earlier stage. 

Conversely, if it is known for a class A that all y € A satisfy y € Vy(ay), 
and if we are told that there is a stage after all the a, (that is, sup{ay: y € A} 
exists and equals, say, 6), then A C ite: Vnu(ay) C N U Vy(B) (the last C 
by VI.6.6). Hence A € Vy (6 + 1), i-e., A is constructed at stage 6 + 1 from N, 
as a set. This formalizes Principle 2. 


Principle 1 is formalized as VI.6.16. 


| This is meant in the non-strict sense. a need not be the earliest stage (i.e., smallest ordinal) at 
which x is formed. 
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VI.6.19 Definition (Rank of an Object in Uy). The rank of an object x in 
Un, pn(x), is the earliest stage a at which x is formed from N, in the sense 
Pn(x) = min{fa:x € Vy(a@)}. 


VI.6.20 Remark. (1) We will normally suppress the subscript N on p. See 
also VI.6.24 below. 


(2) Definition VI.6.19 wants (x) to be the earliest stage @ at which we can 
place the set or urelement x as a member of Vy(a). This deviates from the 
standard definition given in most of the literature, where the rank is defined as 
the smallest aw such that x C Vy(q@). 


We prefer VI.6.19 (the adoption of which affects computation of ranks, but 
no other theoretical results) because we find it aesthetically more pleasing not 
to have both sets and urelements at stage 0. The alternate definition gives rank 
0 to Y, as it does to any atom — see for example Barwise (1975). With respect to 
the literature that admits no urelements in the theory, our objection disappears. 


(3) The adoption of VI.6.19 makes p(x) a successor for all sets x. In- 
deed, if Lim(o(x)) (why can it not be that p(x) = 07), then, as Vy (p(x)) = 
U{Vi(a): a < p(x)}, we would get x € Vy(a) for some a < p(x). 


(4) VI.6.19 yields p(~) = a + 1 (see below), while the standard rank, “rk’’, 
would have rk(a) = a (Exercise VI.24). © 


VI.6.21 Proposition. (Va)op(a) =a + 1. 


Proof. By V1.6.9, a € Vy(a + 1) — Vy(6), where 6 < a. 


VI.6.22 Example. If x C N, where N is a set of urelements, then py(x) = 1. 
Well, first, x ¢ N = Vy(0), and second, x € Vy(1) = P(N U Vy (0)). 


In particular, py(@) = 1. © 


VI.6.23 Proposition. For any sets x, y, 


(i) x € y implies p(x) < p(y), 
(ii) x € y implies p(x) < ply). 


Proof. (i): Let x € y and p(y) = a+ 1. Then x € y € P(N U Vy(a)); 
hence x € N U Vy(q), from which p(x) < max(0, a). 

(ii): Letx C yand p(y) =a + 1. Here, x C NUVy(a@); hence x € Vy(a + 1); 
thus p(x) <a+l. 
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VI.6.24 Proposition. py satisfies the recurrence equations 
0 ifxe N 


py(xX) = | (ie: pv(y)) +1. otherwise 


Proof. N = Vy(0) settles the first equation. Let now x be a set and a = 
ts ee pPn(y), while py(x) = B + 1. Since py(y) < a for all y € x, we get 
(Vy €x)y € NUVy(q@) (by VI.6.6); hence x C NUVy(q@); thus x € Vy(a+1). 
This yields 

Bt+til<a+l (1) 
By VI.6.23, (Vy € x)on(y) < B + 1; hence (Vy € x)py(y) < B. Thus a < fp; 
hencea + 1 < 6 + 1. Using (1), we get the second equation. 


The above recurrence shows that the dependence of py on N is straight- 
forward (initialization), which justifies our lack of caution in suppressing the 


subscript NV. © 


VI.6.25 Example. Let us re-compute p(@): p(@) = (Uses p(y) 4+1=0+ 
LT, 

Next, let us compute again p(x) for @ # x C N (where N is our initial atom 
eH pG)= oie p(v)) aq = Oj e004 P=4; 


VI.6.26 Example. Let us rediscover the identity o(@) = a+ 1, using induction 
over On in connection with VI.6.24. Use p(6) = 6 + 1 for 6B < a asI.H. 


Then p(a) = (UpeoB rt 1) +i Ssupt@4+1e 41. 


VI.6.27 Example. Suppose that f is a function with dom(f) C w+1, such that 
f(@) J. Let us estimate f’s rank. We know that (w, x) € f for some x. Now, 
Po, x)) = plo, {, x}}) 
= max(w+ 1, p({w, x}))+ 1 
= max (cw + 1, max (o+ 1, p(x) + 1) +1 
>oa+3 


By VI.6.23, p((w, x)) < o(f); thus, f ¢ Vy(@ + 3). 


This entails, in particular, that an inhabitant of Vy (@+3) would be oblivious 
to the fact that there is a 1-1 correspondence w ~ w + 1, since even though w 
and w + | are “visible” in Vy(@ + 3), the 1-1 correspondence is not. © 


+ In anticipation of ordinal arithmetic in Section VI.10, we are taking here the notational liberty 
to write things such as “a + 3” for ((a+1)+1)+4+ 1. 
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VI.6.28 Example. Next, assume Lim(q), and let 6 < y, both in Vy(q@), and 
f:y — B bea 1-1 correspondence. Is f € Vy(a)? 


f Cy x B. Now, 


ey x B= [J s,m) +1 


(ney xB 
= U maxis +3, +3) +1 
(S.neyxB 
<yt3, since max(6+3,n+3)<y+2 


Thus, p(f) < y +3 by VI.6.23, so that f € Vy(y +3) and hence f € Vy(q@), 
since y + 3 <a. An inhabitant of Vi (@) will witness the fact that y ~ 6. 


What else is the rank function good for? 


It allows us to formalize the argument in II.8.3(1) to prove that collection 
follows from the following restricted axiom (replacement axiom), and therefore 
it is equivalent to it. This result strengthens II.8.12 by proving the the last 
version ((1) below) implies the first (III.8.2) (see p. 173). 


(Wx € A)(Aly).A (x, y) > (Az)(Vx € A)(Ay € z).7%(x, y) (1) 


To avoid repetitiousness in the arguments that follow, let us show once and 
for all that 


VI.6.29 Proposition. (Tarski (1955).) For every class B there is a setb C B 
such that 


(1) BAD bDFY, 


(2) b can be given as a set term in terms of 


Proof. Let us define b = @ if B = %. Otherwise, let us collect in b all those 
members of B of least rank (compare with the idea developed in III.8.3(D). 
That is, 


b= {x € B: (Vy € B)p(x) < p(y)} (i) 


By (i), b © B. By assumption on B, there is a minimum a@ such that 6 ~ 
N Vu(a). Thus, b 4 @ and (Vx € b)p(x) = a@ (if some x, y in b have 
p(x) < p(y), then, as x, y are in B as well, we must also have p(y) < p(x), 
untenable). Thus, b C Vy(q@); hence it is a set. 


(i) is the “computation” of b as a set term in terms of 


© 
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We now turn to show that “replacement” ((1) above) implies collection (there 
is no circularity in this, for anywhere that we have used collection, the restricted 


form (1) sufficed). 


Assume then 


(Wx € A)(Ay)./(x, y) 


(2) 


Thus, there is a nonempty class B, = {y:.“(x, y)} for each x € A. 


Let, for each x € A, b, 4 G be the set “computed” by VI.6.29(7). Using 
class notation for readability, (2) translates into 


(Vx € A)(Ay)y 
We have by VI.6.29 


€E 


= 


(Wx € A)(Aly)y = dy 


(2’) 


(3) 


Why “S!? Because, referring to the proof of VI.6.29, there is only one mini- 
mum a@ such that B, 9 Vy (@) 4 @, and hence only one b,. On the other hand, 
Fb, =y—> by =z— y = z. By schema (1), (3) yields 


(Az)(Wx € A)(Ay € 


Let then C be a new constant, and add 


Z)y = by 


(vx € A)Gy)(y €e CA y= by) 


By the one point rule (1.6.2), the above implies (Vx € A)b, € C; thus {b, : x € A} 
is a set by separation; hence so is |J{b,:x € A}. Call this union D (new 


constant). 


We are almost done. Let x € A. Then 


(ay € D)ye 


x #% WO, hence by VI.6.29, b, 4 @. 
This allows us to add a new constant e and also add e € b,. By by C D we 
have e € D. By b, C B, we have e € B,. Thus,e e DAee 


x 


3; hence 


By the deduction theorem x € A > (Ay € D)y € B,; hence (generalizing) 


(Vx € A)\y € D)y € B 


from which, eliminating class notation, 


(Az)(Vx € A)(ay € 


= 


z).F(x, y) 


This, along with (2), proves collection (III.8.2). 


VI.6.30 Example. Let the relation P satisfy a “weak” MC, namely, for every 
nonempty set x, there is a P-minimal element y € x, i.e., ~(G@z € x)zP y, or 


P(y)Nx = 


i) 


It will follow that P has “ordinary” (strong) MC, as defined in VI.1.22. 
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Indeed, let 6 # A, and let A have no P-minimal elements, that is, 


forallac A, Ba=Pla)NAZD (1) 


Now, we have no reason to assume that the B,’s are sets (e.g., P might fail to 
be left-narrow). However, by VI.6.29, we get for each a € A a nonempty set 
ba © Bg. Let 


S =|) (fa) x ba) 


acA 
Since S C P, S has “weak” MC as well (compare with Exercise VI.1), and it 


is left-narrow, since for all a € dom(S) we have S(a) = by. By Exercise V1.4, 
there is ana € A such that 


Sla)NA=G 
or 
b,NA=B 


which contradicts (1). 


In particular, taking P = € (the relation “e’”’), this shows that the “set version” 
(single axiom) of foundation, 


(ax € y) > Gr € y)7(az € yz Ex 


is equivalent to the “class version” that we gave as a schema (III.7.2). 

As promised in Remark VI.2.14, Corollary VI.2.13 can now be strengthened 
to allow A to be any class, possibly proper. The restriction to a set A in the 
proof of VI.2.13 was meant to allow the use of AC, proving there that well- 
foundedness implies MC. Now let P be well-founded over a class A. The proof 
of VI.2.13, unchanged, now starting with “...let@ #4 B C A, where B is a 
set,'...”, shows that P has weak MC over A. In view of the equivalence of the 
strong and weak versions of MC, P has (strong) MC over A. 


V1.7. A Pairing Function on the Ordinals 


In this section we establish a useful technical result that we will employ in 
Section VI.9 and in the next chapter. We show that On x On can be well- 
ordered, and when this is done, 


On x On = On 


+ The italicised hypothesis is explicitly added since A may be proper class. 
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We start by noting that, since for any two ordinals a, f one has either a < f or 
B <a, it makes sense to define 


a ifB<a 
Be | B otherwise 
and 
: a ifa<f 
a ae | B otherwise 
cennve < is C, we have max(a, 8) = a U # and min(a, B) =aN B. © 


VI.7.1 Definition. We define a relation < on On x On by 
(o,T)< (a, B) iff max(o, tT) < max(a, B) V 
max(o, T) = max(a, B) A 
(o <aV(o=a At <B)) 


VI.7.2 Proposition. < is a well-ordering on On x On. 


Proof. We delegate the details, for example that < is a linear order, to the reader 
(Exercise VI.26). 


Let us argue that it has MC. To this end, let @ 4 A C On x On. The class 
{a U B:(a, B) € A} has a smallest member y. This is realized as y = a UB 
for some (perhaps several) (a, 8) €¢ A. Among those, pick all with the smallest 
a (first component), i.e., setting 


Fo,t)£ y =oUtA (6,7) eA 
form the class 
{(a, B) :.F (a, B) A Wo)(Vt)(F (6, T) > a < o)} (1) 


and finally pick in (1) that (a, 8) with the smallest 6. 


Let us verify that (a, 6) is <-minimal in A: If (0, t)<(a@, 8B) because o Ut < 
aUB=y, then (o, t) ¢ A by the choice of y. Let it then be so because o Ut = 
aU B= y, but o <a. Then (o, t) ¢ A by the choice of a. The last case to 
consider also yields (o, t) ¢ A, by the choice of 8. 


VI.7.3 Theorem. (On’, <) X (On, <). 


Proof. By V1.7.2 and VI.3.20 we have one of 


(1) (On’, <) = (< (a), <) for some a, 
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(2) (a((a, B)), <) = (On, <), 
(3) (On?, 4) = (On, <). 


To the left of = in (1) we have a proper class (e.g., dom(On*) = On). Thus, 
this case is untenable. Case (2) is impossible as well, for the left hand side of 
> isa subclass of y x y, where y = (a2 U B) + 1, and hence a set. 

This leaves (3) as the only possibility. 


@as a result, < is left-narrow on On’. © 


VI1.7.4 Remark. The unique function J : On? — On that effects the isomor- 
phism of Theorem VI.7.3 is an instance of a pairing function on the ordinals — 
that is, a 1-1, total function On? —> On. This particular one is also onto; 
thus there is an inverse J~!:On — On?. We reserve the letters K , L to write 
J~! = (K,L). The K, L are the first and second projections of J and satisfy 
(K,L)o J = 19, and J o (K, L) = 1op (the latter only because J is onto). 
Thus, 


K(J(@, B))=a 
and 


L(J(a, B)) =a 
for all a, 6. 


With the aid of K, L we can enumerate all the pairs in On’ by the (total) 
function a + (K(qa), L(a)) on On. Note how each of K and L enumerates 
each ordinal o infinitely often (why?) 


Pairing functions play an important role in recursion theory (from where 
the notation is borrowed here). For a detailed account of computable pairing 
functions on N see Tourlakis (1984). © 


What are pairing functions good for? Section VI.9 will exhibit a substantial 
application. The next one will be given in Chapter VII. For now let us extend 
the coding of pairs (of ordinals) that J effects into a coding of “vectors” of 
ordinals (compare with II.10.4). 


VI.7.5 Definition. We define by induction on n € w the functions J,,: 


Ji(a) =a for alla 
In¢1Ong1) = I(In(Gn), On4i) for all O41 


where @,, stands for the sequence @,..., Qn. 
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We also define for each n € w the functions 7’ fort i = 1,...,n by 
M1 (a) =a for all aw 
m"*l(a) = L(a) for all a 


mt!) =! (K(a)) for all a 


It is trivial to verify that for eachn € w, J, : On” — Onisa 1-1 correspondence 
of which a +> (7/'(@),..., 27/'(@)) is the inverse. 


VI.7.6 Proposition. For each a, B, J(a, B) > a and J(a, B) > B. 


Proof. Fix a B, and leto <T. 


Case 1. If t < B, then (a, B) <1 (tT, B); hence J(a, B) < J(t, B). 
Case 2. If t > 6, theno UB < t UB, so that again (o, 6) < (tT, 8); thus 
also J(o, B) < J(t, B). 


This amounts to Aw. J (a, 8) being order-preserving on On; hence the first 
required inequality follows from VI.3.15. The second inequality is proved 
similarly. 


VI.7.7 Corollary. For each, n, a, andi €n+1— {0}, (a) <a. 


J and < have a number of additional interesting properties. We discuss one 
more here and delegate the others to the exercises section. 


VI.7.8 Example. We show here that J[w?] = w. 


Throughout this example, ...? stands for... x ..., not for ordinal multipli- 
cation or exponentiation (which have not been introduced yet anyway). 


Let (n,m) € w*, where at least one of n and m is nonzero. 


Case I. n = (0. Then the immediate predecessor is (m — 1, m — 1); hence 
J(n,m) = J(m—1,m—1)+1. 

Case 2. 0 <n < m. Then the next pair down is (n —1, m); hence J(n, m) = 
J(n —1,m)+ 1. 

Case 3.n > m = 0. Then the next pair down is (n — 1,n — 1); hence 
J(n,m) = J(n-—1,n—-—1)+1. 

Case 4. n > m > 0. Then the next pair down is (n, m— 1); hence J(n, m) = 
J(n,m—1)+1. 


+ i en+ 1 — {0}, if you want to avoid “...”. 


© 
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Thus, for each (n,m) € w*, J(n, m) is a successor (or 0 = J(0, 0)); therefore 
J[w*] © w. Now J[w?] = J(0, w) (Exercises VI.28 and VI.29); hence J[w*] > 
w, by VI.7.6. Thus @ is a fixed point of Aw. J [a7]. 


VI.8. Absoluteness 


We expand here on the notions introduced in Section I.7. We will be interested 
in exploring the phenomenon where inhabitants of universes M, possibly much 
smaller than Uy, can correctly tell that a sentence. 4 is true in Uy, even though 
their knowledge goes no further than what is going on in their “world” M. 

We start by repeating the definition of relativization of formulas, this time 
in the specific context of Lge. Since we here use Lge, as our interpretation 
language, we go one step further and interpret € as € and U as U. Thus, we 
restate below Definition I.7.3 under these assumptions. 


VI.8.1 Definition (Relativization of Formulas and Terms). Given a class M 
for which M + 9 is a theoremé and a formula.7. We denote by .7™ the for- 
mula obtained from.¥ by replacing each occurrence of (Ax) in it by (dx € M). 


More precisely, by induction on formulas we define 


U(x) = U(x) 
GeyM =x ey 
(x=yMexay 


(6) = (-V4")) 

4V.B™ = 06M v.@™) 
((ax).2)™ = ((ax € M). 2") 
Now let T(“) = {x :.7 (x, “)} be aclass term depending on the (free) variables 

i. Its relativization to a class M is defined as T™@(”) = {x e M:.7™“(a, w)}. 
The terminology “T(i) is defined in M” is argot for the assertion “TM (a) € 
Mis provable on the assumptions u; € M (for alli)”. Mis T-closediff Tu) « M 
for all u; ¢ M. 


The reader must have noticed that we now use %™ rather than 7-7 


(cf. 1.7.3), as that is the normal practice in the context of set theory. 

Recall that the primary logical connectives are —, V, and i. In contrast, 
V, A, >, < are defined symbols, which is why the above definition does not 
refer to them. 


poe 


Clearly, if .F is quantifier-free, then. 7 ™ is .F. 


+ Of ZF or of ZFC or of whatever fragment “ZFC” of ZFC we want to use in a formal interpreta- 
tion J = (Lget, ZFC’, M). 
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To say that T(”) is defined in M is to say that for all u; ¢ M, TM(a) is a 
set, and a member of M. Clearly, TU" (az) = T(a), and “T() is defined in Uy” 
simply means that for all u;, T(v) is a set. © 


@VI82 Example. What is {a, b}™? It is 


{x € M:(x =aVvx =b)"} = {x € Mix =avx=)} 
= {a,b}NM 


This proves (in ZF — f, for example) that 
xeM—> yeM = {x, y}™ = {x, y} 


That is, for a and b chosen in M, “{a, b}” has the same meaning in M as it has 
in Un : 


If M is {a, b}-closed, then {a, b} is defined in M. © 


VI.8.3 Remark (“Truth” in M). We use the short argot “F (x1, ..., X,) is true 
in M” for the longer argot “F (x,,..., Xn) is true in J = (Lget, ZFC’, My”. We 
will often write this assertion as 


Em F (1,---, Xn) () 


We will recall from I.7.4 the translation of the above argot, (1), where we use 
here “ZFC’” for some unspecified fragment of ZFC: 


Hope x1 € M Ax EMA---Ax, EMS F™(y,..., x) (2) 


The part “x, € MAx.e€ MA---Ax, € M —” in (2) is empty if .F isa 
sentence. 

Platonistically (semantically), “truth in M’ is just that; in the sense of 
1.5. Indeed, the notation (1) states such truth from the semantic viewpoint 
as well. In this and the next section however, our use of (1) is in the syntactic 
sense (2). 


VI.8.4 Remark. Thinking once again Platonistically (semantically), let us ver- 
ify that for any formula.¥ and all a, b,...in M, 


Em Flla,b,...] iff Ku, F“La,b,...] (1) 


assuming that N is the supply of atoms we have used to build the von Neumann 
universe. This is an easy induction on formulas, and the details are left to the 
reader. Here are some cases: For. = U(x) we have, for a € M, 


Ky U(x)La] iff Ku, U@)[a] iff(ef. VL8.1) uy, UM@d[a] 
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Say ¥ = -—.4. Then, fora,...in M, 

Em (-.4)[a,...]] 
iff 
Am -4[La,...] 
iff (by IH.) 

Ay, 6! [a,...] 
iff 


Eu, (4) [a,...] 
Say ¥ = (Ax).4. Then, for a,...in M, 
Eng ((ax).4) la,...] 


iff 
(Ji eM) Ey. 4[i,a,...] 
iff (by LH.) 
(ji €M) ku, 6" Li,a,..-] 
iff 
(Ji) Ku, @ eM a4") Li,a,...] 
iff 
Ky, (ax € M).4™ [a,...] 
iff 


yy (Ax). 4)" fa... J 
Thus, there are two ways to semantically evaluate a formula.¥ in M. One is to 
act as an inhabitant of M. You then evaluate Y in the standard way indicated 
in 1.5. The other way is to act as an inhabitant of Uy. Before you evaluate.Y in 
M, however, you ensure that it is relativized into. ™ and that the values you 
plug into free variables are from M. Then, both methods yield the same result. 

Note, nevertheless, that an inhabitant of M may think (because he evaluated 
so, and he knows of no worlds beyond his to know better) that a sentence .¥ is 
true, when in reality, i.e., absolutely speaking, something somewhat different 
is true:. 7 ™, 

Sometimes (for some .Y and some M) the reality in M and the absolute 
reality coincide, and this is wonderful, for, in that case, what an inhabitant 
of M considers to be true is really true. We will explore this phenomenon 
shortly. 


VI.8.5 Example (Informal). Let M@ = {a, {a}, {a, b}}, where a # b are ure- 
lements. Set A = {a} and B = {a, b}. Clearly, A 4 B; hence also (A ¢ B)™. 
Yet, 


((Vx)(x € A x € B))” 


© 
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that is, 


(Vx Ee M)y(xeAoxe B) 
is (really) true. In short (see previous remark) 
Em (Vx)(x € Aox € B) 


Thus, M does not satisfy extensionality. 


It turns out that transitive classes do not have this flaw (M, of course, is not 
transitive). 


VI.8.6 Definition (Ao-formulas). The set of the Ag-formulas is the smallest 
subset of all formulas of Ls., that 


(1) includes all the atomic formulas (of the types x; = x;, U(x;), x; € x;), and 
(2) is such that whenever the formulas .4,.7 are included, so are (—.4), 
(4V #), and (Ax; € x;).@) for any variables x;, x; (x; # x;). 


The Ag-formulas are also called restricted formulas, as quantification is always 
bounded by asking that the quantified variable belong to some set x;. 

Once more, we refer here only to the connectives —, V, J, since the others 
(V, A, etc.) are expressible in terms of them. As always, when writing down 
formulas, whether these are restricted or not, we will only use just enough 
brackets to avoid ambiguities. © 


VI.8.7 Lemma. /f M is a transitive class and 4(x),..., Xn) is a Ag-formula, 
then for all x; € M, 


(kn) > AM y) (1) 


eine above claim (1) is short for 


bop x1 € MA---Ax, €M—> (.4G,) ° AMG,)) © 


Proof. Induction on Ao-formulas: 
Basis. The contention follows from VI.8.1 and VI.8.6. 


Take as LH. that (1) holds for .4 and .%. It is trivial that it holds for —.4 
and.4V .# as well. Let us then concentrate on (Ay € z).4(y, Xn). 


—: Assume now b € M, a; € M,...,a, € M (all these letters are 
free variables), and also add the assumption (dy €b).4(y, dy), that is, 
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(ay)(y EDA. AY, Gn)). This allows us to introduce the assumption 
Y €bA.AYY, Gn) 


where Y is a new constant. Since b € M, it follows that Y ¢ M by transitivity. 
Thus, Y e MAY € bA.A(Y,a,), from which the basis case and the LH. 
yield (via the Leibniz rule) 


YeMa(¥ eb™ a. 4™VY, an) 


The substitution axiom yields 


(dy € M((y € by A.4™(y, Gn) (2) 
Thus, using VI.8.1, 
(Ay €b). Ay, Gn)” (3) 


By the deduction theorem (omitting the F--subscript), 


bbeM—>a,eM—>.--- > y eb). Ay, a) > (Ay €b). 40, &)) 
(4) 


<: Conversely, let b ¢ M, a; € M,...,a, € M (all these letters are free 
variables), and also add the assumption (3). By VI.8.1 this yields (2). Hence 
(via the Leibniz rule, the basis part, and the I.H.), (dy €¢ M) (y € DA. AY, Gn )), 
1.€., 


(y)\(y eM Ay €bA.A(Y, Gn) 
from which tautological implication along with 4-monotonicity yields 


(y)(y € DA. AQ, Gn) 


The deduction theorem does the rest. 


Thus, Platonistically, a Ao-formula does not think that it is someone else if 
you give it more or less “interpretive freedom” (from a transitive M to Uy and 
back). Its “meaning” is, somehow, “absolute”. 


Less well-endowed formulas could suffer a change in meaning as we go 
beyond a transitive class M. For example, (Vx).2(x) can be true if the search 
(Vx) is restricted to M, but might fail to be so if the search is extended beyond 
M. Similarly, a formula (Ax).7(x) might be true in Uy, for there we can find 
an x that works (a “witness”’), whereas in a “smaller” class M such an x might 
fail to exist. 
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In view of Remark VI.8.4, we can read VI.8.7 also this way: If .F is a 
Ao-formula and M is transitive, then (Platonistically) for all a,...in M, 


Em’ [a,...] iff Eu, 7lla,...] (5) 
by (1) in VI.8.4. Thus, an inhabitant of M will think that a Ao-sentence is true 
iff it is really true. © 


VI.8.8 Definition (Absolute Formulas). For formula .4(x,) to be absolute 
for a class M (that is not necessarily transitive) means that 


ype x1 € M>-+- > x, EM > (.4G,) & OMG) 


A class term T(x,) is absolute for M iff 


bore x1 € M>--- 3x, EM > T(X,) = T!M(,) 


en using the above terminology, VI.8.7 says that Ao-formulas are absolute 
for transitive classes. Example VI1.8.2 shows that (the term) {a, b} is absolute 
for any class M. 


If Tu) = {x :.7 (x, u)} and.7 is absolute for M, then for u; ¢ M, 
T™ (a) = {x eM: 7 M(x, w} 
={xeM:7(x,u)} by absoluteness of .7 
= TU)AM 
If moreover we know that T(#) C M for all u; € M, then T(z) is absolute for 
M, as happened in the special case T(x, y) = {x, y}. © 


In view of Definition VI.8.8, and inspecting the proof of VI.8.7, we can state 
at once: 


VI.8.9 Corollary. The set of formulas that are absolute for some class M is 
closed under the Boolean connectives and the bounded quantifiers (Ax € y) 
and (Wx € y). 


VI.8.10 Corollary. Extensionality holds in any transitive class M. 


Proof. Extensionality in M states that 


AeM>BeM—> 
(AU(A) A AU(B) A (Wx € Ax Ee BAWx € B)x Ee A> A=B)” 

Since inside (...)™ 
quence of the extensionality axiom and VIL.8.7. 


we have a Ag-formula, the above is a tautological conse- 
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VI.8.11 Lemma. Foundation holds in any class M. 


Proof, Let M = {x:./%(x)}. Foundation says that 
(Ax). 4[x] > Ax)-4[x] A -Q@y € x).4[y]) (1) 
Its relativization to M is 
(ax € M).4™[x] > @x eM)(.4™[x] A =Gy eM) ex A.4™[y])) (2) 


using VI.8.1. Letting now a; € M,..., a, € M, where a, are the free variables 
in (2), we can have a proof of (2) in ZF, for it is an instance of the schema “e 
(the relation) has MC over M’’, a provable schema by the foundation axiom 
(cf. VI.1.25). 


The lemma can be strengthened by effecting the interpretation in “ZF — f”’, that 
is, dropping foundation. We have to take a few precautions: 


(1) Mis now not arbitrary, but is taken to be any subclass of N U WEF y. 
(2) (x, y) is defined by {{x}, {x, y}} to avoid foundation. 
(3) Ordinals are defined by VI.4.25. 


We know then that we do have foundation in N U WF y (provably) and can 
conclude the proof above in the same way, the only difference being the reason 
we have foundation (a theorem rather than an axiom). © 


VI.8.12 Lemma. For any transitive class M and class term T(u), y = Ta) 
iff (y = T@)™, for all y, uj; € M. 


The proof can be carried out in ZF — f. The statement is in the customary argot, 
but it states an implication, from premises y € M, u; € M, to the conclusion 


y= TG) o(y=T@)™ © 


Proof. We write throughout T(v) = {x :.7 (x, w)}. 


We calculate as follows: 


(-U() A (W277, tt) & 2 € y))™ 
i U(y) A Wz € MF Mz, t) > 2 y) 
~ U(y) A Walz EM > (7 Mz, t) 2 € y)) 
“= Uy) A (Wz(z Ee MAF Mz, 0) oz Ey) 
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The last equivalence above uses the Leibniz rule, the tautology 
(4 -(B< ®)) C(4AB>E\N(4ZANE > #) 


and transitivity of M -— which implies z € y — z €e MAZze yy. Noting 
(1.4.1) that the first line of our calculation says (y = T(u))™, while the last 
says y = T™ (a); we are done. 


ee Remark. The above result is useful: We often need to show that the 
M-relativization of the formula “{x :4(x, “)} is a set” is provable. That is, we 
need to show, for u; € M (free variables), the derivability of 


((ay)y = A@)™ (1) 


where we have set A(uv) = {x : 4(x, u)}. 


(1) stands for 
=\\M 
(dye M)(y = A(ii)) 
which under the assumptions of VI.8.12 is provably equivalent to 
(ay € My = AMG) 


Thus, to prove (1), it is necessary and sufficient to prove that A is defined in M. 


One would apply this remark to proving that axioms such as those for sepa- 
ration, pairing, union, and power set hold in a transitive class. © 


VI.8.14 Corollary. Let M be transitive, S be absolute for M, and T be absolute 
for and defined in M. Assume also that the formula.F is absolute for M. Then: 


(i) F[T@)] is absolute for M. 
(ii) S[T(a)] is absolute for M. 


Proof. Let a, € M,..., uy € M,..., where aj,...,u,,...are all the free 
variables occurring in the formulas above. 
(i): We first relativize Y [T(u“)]. This formula means (see III.11.16) 


€y\(F7bIA y = TW) (1) 


Thus, (F [T@])” is provably equivalent to (using the absoluteness assump- 
tions and Leibniz rule) 


(ay € M)(Fl]AG = TM”) (2) 
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By VL8.12, (2) is provably equivalent to 
(y € M(AlylA y = T"@) 
and hence to 
Gy € M)(FIy] A y = T@)) (3) 


since T is absolute for M. 

Now we want to argue that (3) is provably equivalent to (1): Indeed, (3) 
implies (1) trivially; conversely, (3) follows from (1), forif y (auxiliary constant) 
works in the latter, then it satisfies y = T[u]; hence y € M under the assumptions 
on T and u. 


(ii): Next, start by observing that .”, where S(Z) = {x :.“(x, Z)}, is absolute 


for M, since for x, z; € M (free variables), using “<>” conjunctionally, we have 


SFM (x, 2) & x € SMQ) 
<x € SQ) by absoluteness of S 
<> SF(x,Z) 


Thus, 


(sIT@1)" [x eM: (7x, T@))"| 

{x € M:.7%[x, TM)]}, by part (i)! 
= S“IT@)] 

= S[T(@)], by absoluteness of S 


VI.8.15 Example. In particular, for any transitive class M that is {a, b}-closed, 
{a, {a, b}}™" = {a, {a, b}} and {{a}, {a, b}}M@ = {{a}, fa, b}} are provable in 
ZF — ft for a € M and b € M. In other words, for such a class either imple- 
mentation of the ordered pair (a, b) (among the two that we have mentioned) 
is an absolute term. © 


VI.8.16 Lemma. The following are absolute for any transitive class M: 


(i) ACB. 
(ii) A= 9. 
(iii) A is a pair (also, A = {x, y}). 
(iv) A is an ordered pair (also, A = (x, y)). 


+ Note that T(#) € M. 
 “f” enters in proving {a, {a, b}} = {a’, {a’, b'}} > a =a’! Ab =D’ and is not needed here. 
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(v) x=7(A) (2(A) is the first projection if A is an ordered pair; % 
otherwise). 
(vi) x =6(A) (6(A) is the second projection if A is an ordered pair; 0 
otherwise). 
(vii) A is a relation. 
(viii) A is an order. 
(ix) A is a function (also, A is a 1-1 function). 
(x) A is a transitive set. 
(xi) A is an ordinal. 
(xii) A is a limit ordinal. 
(xiii) A is a successor. 
(xiv) A € @ (or, A is a natural number). 
(xv) A=a. 
(xvi) {a, b}. 
(xvii) @. 
(xviii) A—B. 
(xix) UA. 
(xx) () A (to make this term total we re-define (| as @). 
(xxi) x € dom(A). 
(xxii) x € ran(A). 
(xxiii) (xx € dom(A))F, if .F is absolute for M, where “x” is A or V. 
(xxiv) («x € ran(A))F, if F is absolute for M, where “x” isd or V. 


Add the assumption that M is {a, b}-closed. Then the following terms are ab- 
solute for M: 


(1) Ax B. 
(2) dom(A). 
(3) ran(A). 


Proof. Most of these will be left to the reader (Exercise VI.61). Let us sample 
a few: 

(iii): A is a pair: x € A)\Ay € A)\(Vz € Al(z=x V z=y). This is a Ap- 
formula, and hence absolute for any transitive class. 

(x): A is a transitive set: -U(A) A (Vy € A)(Vx € y)x EA. 

(xiv): A € w: “A is an ordinal A A is a successor or 0 A (Vx € A)x isa 
successor or 0” is a Ag-formula. 

(xxi): Hint. (x, y) = {x, {x, y}}. Thus, if (x, y) € A, then ye LU UA. 

(1): Ax B= {z: Gx € A)(y € B)z = (x, y)}. Since the defining formula 
is Ao, (A x BM = (A x B)NM = A x B, since Mis {x, y}-closed (and hence 
(x, y)-closed). 
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VI.8.17 Example. Let M be a transitive class that is closed under pairs (hence 
also under ordered pairs), and R = {(x, y) : .4(x, y)} a relation, where .# is 
absolute for MI. We calculate R™, noting that (cf. III.8.7) 


R= {z (Ax)Ay)((x, y) = 2A Ax, »))} 
We have 


RM = {2 € M: (ax € My € M)((x, y) =< A Ax, »))} 


= {z : Axy@y)((x, y) =2 Ac eMAx eMaAyeMA AG, y))| 
= fz : Ax\(ay)((x, y) =cAxeMAyeMA ZG, »))| 


= {.») :xeMAyeMAAlx, y)} 


[(.y) eMxM:.A~(x, »)| 
=RA(M x M) 
=R|M 


The third “=” stems from the assumption that M is closed under pairs, which 
leads to the equivalence 


(x,y) =zAzEMAxeMAyeMe (x,y) =zAxeMaAyeM 


With some practice one tends to shrug off calculations such as the above and 
write RM = {(x, y) €M x M:.A(x, y)} directly. 


VI.8.18 Example. Continuing under the same assumptions as in VI.8.17, let 
moreover R be a function, that is, we assume (alternatively, have a proof) that 


TAX, Y) AN FA%,Z) > y=Z (1) 


Tautological implication yields 


xeMo>yeMozeM 2, yA Bago y=z 


Hence, using assumption of absoluteness and the Leibniz rule, 


xeM>yeM>zeM > PM (x, y)A AM(x,2) > y=z (2) 


or (by VI.8.1) 


xeM>yeM>zeM > (Rx, yA Rx, > y = 2)" (2’) 


That is, in our jargon, “(1) is true in M’. 
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Let us calculate R™ (a) for a € M: 


R™! (a) = {x e M:.2™(a, x)} 
= {x e M:.Z a, x)} 
= MN {x:.H (a, x)} 
= MN Ria) 


Thus, using the standard function notation “R(a)” ({R(a)} = R(a)), 


+ if Ria) é M 
R(a) otherwise 


R™(a) = | (3) 


It follows from (3) that to obtain R(a) ~ R™(a), foralla € M-the absoluteness 
condition — we equivalently need the provability of 


aeM—> Ra \l- R@eM 


that is, that Mis R-closed in a sense weaker than that in VI.8.1: R[M] C M. 
In terms of .% we need the provability of 


xeM—> 2, y)> yeM (4) 
An alternative notation for (4) is obtained if we set D = dom(R): 
(Wx € MN D)R(@) € M (4) 


In summary, we define that a function R = {(x, y) : A(x, y)} is absolute for M 
(a transitive class that is closed under pairs) iff the following three conditions 
hold: 


(i) .# is absolute. 
(ii) (1) (hence, trivially by (i), (2)) is provable. 
(iii) (4) is provable. 


A special case occurs if instead of the informal R we have introduced a 
formal function symbol, R, by 


y=R@)< AG, y) (5) 
because we have a proof of 
(Wx)(Aly) A(x, y) (6) 


Note that (6) combines (1) with (Vx)(Ay).Z(x, y) (no “!’’). Thus, our conditions 
for absoluteness of R — that is, (i)—(i11) — simplify to just (i) along with requiring 
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the provability of 


M 
(wa Ly). R(x, y)) (7) 


Of course, one can go back and forth between a total informal R and a formal 
R (cf. If.11.20); hence the two conditions (i) and (7) constitute all we need for 
absoluteness of a total function R. 

Finally, an interesting subcase of the formal R is that of a constant c — “O-ary 
function symbol” — defined by an absolute for M formula, 4 (y), as 


c=y< &y) (8) 
after securing a proof of 
Aly) #(y) (9) 


In this case the conditions of absoluteness for ¢ are that of @, and the require- 
ment that the M-relativization of (9) be provable. For example, if w ¢ M, then 
o! =a by VI.8.16(xv). 


Not all absoluteness results follow from ascertaining that our formulas are 
Ao. The following is an important example that does not so follow, which uses 
some terminology (finiteness) from the sequel, whence the €@. 


© VI.8.19 Example. A set A is finite, formally, iff there is an onto f :n > A for 
some n € w. That is, 


AU(A) A (Af) is a function A dom(f) is a natural number 
A(Wy € A)(Ax € dom(f))(x, y) € f) 


Now, (1) is not Ao (and it is known that it is not equivalent to a Ao-formula). 
Nevertheless, we can show that 


() 


“A is a finite set” (2) 


is absolute for any transitive class M that satisfies a bit of ZFC. Namely, we want 
M to be closed under pairs and to contain w (w € M). We start by observing 
that the formula to the right of (4 f) is Ao. Indeed, in view of VI.8.16 one need 
only verify that 


“dom(f ) is a natural number” is Ap (3) 


1 Thus, one may introduce the function symbol R™ so that x € M > (y = R@M(x) o ZAM(x, y)) 
and x ¢ M —> R™(x) = Mare provable. 
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The quoted statement in (3) is argot for 
dom(f) =%v (Ax € dom(f))(dom(f) =x U {x} 
A (Vy € dom(f ))(y is a natural number)) 
In view of the existential quantifier preceding it, dom(f ) = x U {x} translates to 
(Vy €x)y € dom(f) A (Vy € dom(f))(y =x Vy € x) 


and we are done with claim (3). Now, by the absoluteness of the component 
of (1) to the right of (4 f), the relativization of the entire formula is provably 
equivalent to 


=U(A) A (Af € M)Cf is a function A dom(f) is a natural number 
A (Wy € A)(Ax € dom(f))(x, y) € f) 
To prove the absoluteness of (2), we let A € M and prove the equivalence of (1) 
and (4). As (4) —> (1) is trivial, we need worry only about (1) — (4). The 
whole story is to show that a “witness” f for (1) (auxiliary constant) will work 
for (3). For the latter we only need prove f € M. So let f be a new constant, 
and add the assumption 


(4) 


=U(A) A f isa function A dom(/) is a natural number 

A (Vy € A)(Ax € dom(f)){x, y) € f 
We set n = dom(/f ) for convenience. Now transitivity of M and w € M imply 
n € M (and also n C M). By induction on m < n we now prove f [m € M. 


For m = 0 we are done by 0 € w € M. Taking an LH. for m < n, consider 
f }(mU {m}). Now, 


fT (mU {m}) = (ffm) U (im, fam))} 
and thus it is in M by closure under pair and absoluteness of union (VI.8.16). 
Thus, f =(f fn) eM. 


(5) 


Pause. How much ZFC did we employ in the above proof? Was the assump- 
tion w € M an overkill? If so, what would be a weaker assumption that still 


works? oe 


© VI.8.20 Example. In our last example we consider a transitive class M that is 
a formal model of ZF, that is, we can prove, say, in ZF, 


ae M Ss a, eM. 4" 


for every ZF axiom .4 of free variables a, .1 


+ Therefore, 3 = (Lyet, ZF, M) is the model, but it is a common abuse of terminology to say that 
M is. 
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If it helps the intuition, Platonistically, we may think of M as a (real, Le., 
semantic) model of ZF, i.e., a set (or proper class) where we have interpreted € 
as € and U as U, and all the ZF axioms turned out to be true. 


Consider the recursive definition 
(Va)F(a) = GCF f a) (1) 


where G is total on Uy, and moreover is absolute for MI. We will show that F 
is also absolute for M and that dom(F™) = On™. 


By the way, 
On™ = {x € M:x is an ordinal} = MN On, by VI.8.16. (2) 


By VI.8.18 we need to prove (in ZF), on the assumptions x € M, x is an ordinal, 
and y € M, that 


(y = Fx)“ & y = FQ) (3) 
and also (cf. (4) in VI.8.18) 
x € MA (x isan ordinal) — F(x) « M (4) 


Note that the first two assumptions, in view of (2), are jointly equivalent to 
x € On™. We can then use, as usual, aw to mean “x € On and x =a”. 


By the proof of VI.5.16,' y = F(q@) stands for 
GAs is afunction A dom(f) = @ U {a} 
A(VB aU fal) AF TB, f(A) A (ay) € f) 


Consulting the list VI.8.16 and also invoking VI.8.14, we observe that, for a 
and y in M, the relativization of (5) — within provable equivalence — introduces 
only one annoying part, namely, (A! f €¢ M). 


(5) 


Let then w € Mand y € M. The relativization of (5) is (provably equiva- 
lent to) 


(Alf e my(s is a function A dom(f) = @ U {a} 


6) 
A (VB EAU lal) AF TB, f(B)) A (ay) € f) 


— 


Actually, no detailed proof was given for this particular statement. The detailed proof that applies 
here as well, with only notational modifications, was given in VI.2.25 for a more general case of 
recursion, not just for recursion over On. 

Note that the term f [ x is short for {z:2(z) € xAz € f}, thatis, {z:((Ay € x)y = m(z))Az € Ff}, 
and is thus absolute, and defined in M on the assumptions x ¢ M and f € M. Thus VI.8.14 
applies. 


+e 
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which we abbreviate as 
(alf « M)@(f.a, y) (6) 


in what follows. We will have shown (3) if we prove, under our underlined 
assumptions, that (5) — (6’), the other direction being trivial. 
Add then (5) as an assumption, as well as a new constant g and the as- 


sumption 
%(g.0,y) (5) 
In VI.5.16 we gave a proof in ZF 
Pause. In ZF? Is this true? 
that 
Waly (Alf (fo, y) (7) 


Since M is a formal model of ZF, and being mindful of the relativization claims 
we made a few steps back, we have (by 1.7.9) a ZF-proof of 


(Va € M)A!ly e MA! f € M)4(f,a, y) (8) 


Specializing (8), we derive (dy ¢ M)(af € M)@(f, a, y). 
Thus, adding two new constants c and h, we may add also 


@ (A, a, c) (9) 
and 
heM (10) 
Now, the “!’’-notation in (7) is short for VWa)(Ay)(Af)@(f, a, y) and 
E(f,ay> (fray) > f=firayay (11) 


Thus, since we have (7), the above and (5’) yield y=c and h= g; hence, 
from (10), g € M. We have derived (cf. (5’)) 


geM~” &g,a, y) 


and hence (6’) by the substitution axiom (the “!” is inserted by (11)). 


So y = F(@) is absolute. We need to establish (4) to show that F is. In fact, 
(4) is a direct result of (8), which also yields dom(F™) = MM On= On™. oe 


© VI.8.21 Exercise. As an important application of the above, prove that TC (x) 
is absolute for any transitive formal model of ZF, M. 
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Hint. Recall that 


TC@)=xULJxuLUxuvUUUY:- 


Express the above as a recursive definition (you only need recursion in @, but 
to fit the result in the style of the previous example, define your function in a 


trivial way for arguments > @). 


VI.8.22 Exercise. Prove (in ZF) that TC satisfies 
U(x) ~ TC(x) = 


and 


sU(x) > TC(x) = x U[_J{TCQ):y € x} 


VI.8.23 Exercise (Induction on T C(x)). Prove that for any formula .¥ (x),' 
ze (¥x)((vy € TC(x)).F(y) > F(s)) > (Wx) F(x) 


That is, to prove .¥ (x) we are helped by the assumption (Vy € TC(x)).F (y) 
that we can add for free. 

Hint. Start by assuming the hypothesis and proving using foundation (equiv- 
alently, €-induction) that zp (Vx)(Wy € TC(x)).F(y). 


© VI.8.24 Exercise. Another important application of the technique in VI.8.20 
is the following: First, working in ZF, assume that (Vx)(A!y).¥(x, y), and prove 
that a unique informal total function F exists (equivalently, you may introduce 
a (formal) unary function symbol, F, for F) such that 


(Vx)F(x) = G(F f TC(x)) (1) 


where we have written G for the function given by y = G(x) © F(x, y) (we 
could also have introduced a formal G). 

Hint. Imitate the proof of VI.2.25. Start by showing that y=F(x) (or 
y = F(x)) must stand for 


IAEA sh x,y) 
where “(f, x, y) abbreviates 


f is a function A dom(f) = TC(x) A 
(Vz € TC(x)) FU FTC), f@—)A 
fx)=y 


1 This works in weaker set theories. For example, neither infinity nor power set axioms are required. 


oe 
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You will need to show that 


Fz WHAIWMAI\ ef x, y) 


Once the fact that F (or F’) can be introduced has been established, show that F 
is absolute for any transitive formal model of ZF for which G is absolute. This 
will imitate the work in VI.8.20. We know that T C(x) is absolute by VI.8.21. We 
need to worry about things such as (Ay € TC(x)), y € TC(x), and dom(f) = 


TC(x). OS 


VI.8.25 Exercise (Absoluteness of Rank). Prove in ZF that the rank p is 
absolute for transitive formal models of ZF. Do so by bringing the recursive 
definition of rank (cf. VI.6.24) into the form (1) in VI.8.24. Treat N as a 


parameter. © 


VI.8.26 Exercise. Prove in ZF that the function J as well as the various zr,’ of 
Section VI.7 are absolute for transitive formal models of ZF. 
Hint. J satisfies the recurrence 


(Vx € On)(Vy € On)J(x, y) = (JX’, y’): (x, y') € On’ A (x, y’) d(x, y)} 
or 
(vx € On\(Vy € On)J(x, y) = ran (J [al (x, v))) 


Absoluteness follows (after some work) from our standard technique that proves 
the existence of recursively defined functions: Start by proving in ZF that 
J(a, B) = y must be given by 


AAE(h a, By) 
where 


&(f,a, By) f isa function A dom(f) = <((a, B)) U {(a, B)} A 
(Vw € dom(f))f(w) = ran ( ft {w}) A 
f(a, B)) =v 
As in all previous cases where this technique was employed, f simply “codes” 


the computation that verifies J(a, 8) = y. Now you will need a few ab- 
soluteness lemmata to conclude your case. For example, you will need the 
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absoluteness of x € (a, B )). This is equivalent to 


OP(x) \m(x) € Ona &(x) € On A 
[((z@) € a V w(x) € B) A (8(x) € @ V 8(x) € B)) V 
(x(x) US(x) = AUB A (tx) <aV a(x) =a A S(x) < B))] 


etc. 
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© We now revisit Section IV.2 from a formal point of view. The quest there was 
simply to show that AC is plausible, but here we will do more. 

We will define a cumulative hierarchy of sets, similar to the von Neumann 
hierarchy, but as in Section IV.2 — and unlike what we did in Section VI.6 — 
we will be careful not to admit all the sets that a “powering stage” yields.' We 
will accept instead only those sets that can be defined by explicit and “simple” 
operations based on what is available “‘so far’. This is one of the differences. 
The other essential difference is that no two sets are constructed at the same 
Stage (even the urelements will be given at distinct stages). 

The construction will build a formal model, J = (Lset, ZF, Ly) of ZFC, 
where, as in Section VI.6, € will be interpreted as € and U as U.! The proof 
of AC in the model will be, unlike the informal argument of Section IV.2, a 
consequence of the fact that a construction stage produces a unique constructible 
object. In particular, all this work will establish that if ZF is consistent, then so 
is ZFC (cf. Section I.7), i.e., adding AC doesn’t hurt a theory that is not already 
broken. 

The whole idea of the construction of J is due to Gédel (1938, 1939, 1940), 
who gave two different constructions, one outlined in Section IV.2, the other to 
be followed here. The story has been retold by many, in many slightly different 
ways. Our version is influenced by the accounts given by Shoenfield (1967), 
Barwise (1975), and Jech (1978b). 

After these preliminaries, we embark now upon the construction of Gédel’s 
constructible universe Ly over any appropriately chosen set of urelements 
N. Sets will be built by iterating some simple “explicit” operations (Gédel 
operations). There have being many variations in the choice of these opera- 
tions. The following definition is one of these variations (close to the version 


+ These are in Vy(w + 1) = P(N U Vy(q)) when we start with the set of urelements N. 
= Thus, 3 will be a so-called €-model, or more accurately, (U, €)-model. 
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in Shoenfield (1967), but with some departures for convenience and user- 
friendliness). 


VI.9.1 Definition (The Gédel Operations). We call the terms §; below the 
Gédel operations: 


Sox, y)=xX-y 

Bix, y) = x Ndom(y) 

B(x, y) = {z ex: U(Z)} 
33(x, y) = {(u, v) € x:u = v} 


Sax, y) = {(u, v) Ex:u ev} 

B5(x, y) = {(u, v) Ex: (v,u) € y} 
Box, y) = {(u, v, w) Ex: (v,w,u) € y} 
57%, y) = {(u, v, w) EX: (u, w, v) € y} 


B(x, y,Z) =XN(y X 2) 
Fox, y) = {x, y} 


The purpose of the Gédel operations is to provide “normalized” terms which 
by repeated composition (substitution of one into the other) will form all the 
“constructible” sets. 

Indices 2—4 take care of the atomic formulas of set theory. Indices 5—7 ensure 
that we can manipulate “vectors”, and, in particular, provide tools to address 
the fact that (u, v, w) 4 (u, (v, w)). Note the absence of power set operations 
(we want to provide subsets in a “controlled” manner). Note also that for each 
i=0,...,7, F(x, y) C x, and §g(x, y, z) C x, a “technical” fact from which 
we will benefit (cf. VI.9.3 below). This technicality compelled the choice of the 
rather awkward x 1 dom(y) and x 1 (y x z) instead of just dom(y) and y x z 
at indices | and 8. 


VI.9.2 Definition (The Sets Constructiblefrom NV). Fix aset N of urelements 


and a function f such that N cs || NV ||, where we have written || || for dom(f), 
an ordinal.t 


+ This means the following: On one hand, trivially, Kzp (Ax)(Ay)\(GU(x) A (Wz Ex)U(Z) Ay 
is a 1-1 function A dom(y) € On A ran(y) =x). For example, x = y = % work, and then we 
can invoke the substitution axiom. Now one can introduce new constants N and f along with 
assumption (cf. p. 75), 


AU(N) A (Wz € N)U(z) A f isa 1-1 function A dom(f) € On A ran(f) = N 
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Define by recursion over On the function a +> Fy: 


fora < ||NII: Fy = f(a), 
fora > NI: Fy Wigieies if Lim(a) Va =0 
fora+1> Nl]: Four = Bi Fata) F,4(a)) if 14 (a) =i <8 
Fo+1 = 88(Fatays Pata): Pata) if T4(a) = 8 
Fos1 = Bol Frtays Frs(ay) if rf(a) = 9 


where the zr’ are those of VI.7.5. 

An object x (i.e., set or urelement) is constructible (from N —a qualification 
that is omitted if it is clear from the context) just in case x = F, for some a. 
We will also say that x is N-constructible. 

Ly is the class of all objects constructible from N (if N = @, then we write 
L rather than Ly). 

We will use the notation ord(x) to indicate min{a : Fy =x}. We will pro- 
nounce ord(x) “order of x”. 


The previous recursive definition is appropriate, since mA(a) <a<a+1 for 
i = 1,2, 3, 4 (see VI.7.7). It uses ordinals to systematically iterate the Gédel 
operations “as long as possible’’. In this section we work in ZF; thus we had to 
ask that N be well-ordered (in ZFC, of course, every set is well-orderable by 
Zermelo’s theorem). The subcase “V a = 0” (part of the case “for a > || NI”) 
takes care of the situation where N = @. Then Fo = @. 


The reader will want to compare the above definition with Definition VI.6.1. 
Ly parallels Uy (rather than Vy). The major differences between the two 
definitions are: 


(1) Instead of using the power set operation at successor ordinal stages, we 
are using explicit (Gédel) operations that are much “weaker” than forming 
power sets, to construct one member of the hierarchy at a time. 

(2) The urelements are not given at once (stage 0), but are “built” one at a time; 
it takes ||| steps to have them all. 


Between successive limit ordinals we ensure that each case (among the 
ten Gédel operations) gets “equal opportunity” to apply (at successor ordinal 
stages), by using a technique recursion theorists call “dovetailing”.‘ 


1 For each case, according as m4 (a) =0,1,...,8, or > 8, all pairs (x(a), 1} (a)) and all triples 
(x(a), m3(a), 1} (a)) will be considered, since @ t> (x(a), m3 (a), mH (a), (a) is onto — as 
follows from VI.7.5 by the observation that (K, L) is onto. 
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Note that ord(x) is similar to p(x) of VI.6.19, but it is a different function 
here, whence the different symbol. 


We easily see by induction on w that Fzp sp(Fy) C N. © 
The following few lemmata are central: 
VI.9.3 Lemma. Let x € Ly. Then y € x implies y € Ly and ord(y) < ord(x). 


Proof. We do induction on ord(x), setting y = ord(x) for convenience. 
If y < ||N | or ||N || < vy = 0, then the claim is vacuously satisfied. 


Let then y > ||N|| and Lim(y). By VI.9.2, x = Uini<pey Fp. Then y € x 
implies y € Fg for some ||N|| < 6 < y. By the obvious LH. (since ord(F) < 
B < y), we have y € Ly and ord(y) < ord(F) < y = ord(x). 


Let y =a +1 > ||N||. We have cases according toi = 14(@): 


y ex = Foy = | Si Fate Frteo) Ee ee 
Bi (Fata), Fys(a) F,4(a)) ifi=8 


Then 

yex = Fr4a) 
by VI.9.1. By the obvious LH. (note that ord(Fy4(q)) < (a) <a<a+l1), 
we have y € Ly and ord(y) < ord( Fy 4(a)) <a+1=ord(x). 


Finally, let y € x = Foy1 = So(Fy(ay, Fatay)- Then y = Fr@) for j = 1 

or j = 2; hence ord(y) = ord( F774) < (a) <a<a-+1. Moreover, y € Ly, 
A 

since it is an “Fg”. 


VI.9.4 Corollary. Ly is a transitive class. 
VI.9.5 Lemma. Ly is closed under §; fori =0,...,9. 


Proof. Straightforward index computations, and VI.9.2, yield this claim. For 
example, say that x, y (sets or atoms) are in Ly, so that x = Fy and y = Fx. 
Then Ja(@, B, ||N' I], 9) = [NI] (by V7.7) and 


Ly 2 Fyya,p,||NI|,9-+1 = B9(X, y) = {x, y} 


Note, in particular, that 


Ly > Fyya,c,||N,9+1 = B(x, x) = {x} 
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Similarly, if x, y are sets with orders as above, x Ndom(y) = Fy,(¢,g,1,1)+1 (why 
is Ja(a, B, 1, 1) + 1 = ||N ||?) The remaining cases are left to the reader. 


VI.9.6 Corollary. [f each x; (i = 1,...,n) isin Ly, so is (x1, X2,..-,Xp). 


This is a theorem schema: one theorem for eachn € N. The proof is by informal 
induction on n. 


Proof. Induction on n. For n = 1, (x;) = x; and there is nothing to prove. 
We proceed to n + 1 via (1.H.) n: Now (u, v) = {u, {u, v}}; hence it is in Ly 
whenever u, v are. Thus, (x1,..., Xp41) = ((X1,---, Xn), Xnt1) € Ly by LH. 


VI.9.7 Lemma. /f x is a set and x C Ly, then there is a set y € Ly such that 
xCy. 


Proof. The hypothesis yields, via Lemma VI.9.5, 
(Vz € x)Ga){z} = Fy 
By collection, there is a set A such that 
(Wz € x)(da € A){z} = Fy 


Lety = ies 4 @(VI.5.22). Borrowing two results from the next section (VI. 10.3 
and VI.10.11), we note that Lim(y + w) and y < y + w. Thus 


Y= Fyio = U Fy 
|N|lsSa<y+o 


will do. 


VI.9.8 Lemma. Ly is closed under N, U, and x. 


Proof. Let the sets x, y bein Ly. ThenxN y = x —(x — y) = Fo(x, Fo(x, y))5 
thus it is in Ly by VI.9.5. 


Closure under U follows by this argument: x Uy C Ly (by transitivity of Ly); 
hence (VI.9.7), for some z € Ly, x Uy Cz. But then, x Uy=(z—x)N(z— y). 


Finally, x x y C Ly, by VI.9.3 and VI.9.6. Thus, for some z € Ly, 
xx yCz.Hencex x y=(x x y) Nz = Ge(z, x, y). 


© 
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Pause. But what about x My when, say, U(x)? We have said in Chapter III that 
the formal operations (M, U, —) are total, so they make sense on atoms (in NV) 
too. 


VI.9.9 Lemma. Ly is closed under dom. 


Proof. Let x € Ly and z € x. By two applications of VI.9.3, (z), i.e., the y 
which for some w satisfies z = {y, {y, w}} € x, is in Ly. Thus, dom(x) C Ly 
and, of course, dom(x) is a set by collection. Thus (VI.9.7), for some u € Ly, 
dom(x) € u, and hence we are done by (VI.9.5), since dom(x) = dom(x)Nu = 


51 (u, x). 


VI.9.10 Lemma (Three “Derived’’ Godel Operations). Ly is closed under 


Bio(x) = {(u, v):(v,u) € x} 
Bux) = {(u, v, w) : (v, W, u) € x} 
B12(x) — {(u, v, w) : (u, Ww, v) € x} 


Proof. Let x be in Ly. Using VL.9.3, collection, and VI.9.7, we have sets 
Y1, 2, y3 in Ly such that 


Sio(x) © 
B(x) S yo 
B12(x) S y3 


Thus, Sio0(%) = ¥5(1, x), F(x) = Fo(y2, x), and F12(x) = F7(y3, x). We are 
done by VI.9.5. 


The following lemma shows that introducing dummy variables does not take 
us out of Ly. It forms the fundamental step of the main result of this section, that 
Ly is a model of ZFC, in that it helps to show that Ly satisfies the separation 
axiom. 


VI.9.11 Lemma. For all n > 1, all i,j among 1,...,n, and all N- 


constructible sets a,,..., Gn, b, the set 
Giana Ses € a, xX "++ X Ay: (Uj, Uj) € b} 
is in Ly. 


ets too is a theorem schema. 
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Proof. Let first i 4 7. We do (informal) induction on the length of (u1,..., Un). 


For the basis, we have n = 2, and the result follows from VI.9.8 and VI.9.5 
on observing that 


FO (ay, ar) = {(uy, U2) € ay X ay: (Uy, U2) € b} = (ay X ND 
ifi < j, and 


F(a, ao) = { (ur, U2) € ay X ay: (U2, U1) € dD} 
= S5(ay x a2, b) 


otherwise. 
For the induction step we consider cases. 


Casen ¢ {i, j}. By LH., 
ROOD ai cs} Gn 21) = {(U1,...,Un—1) Ea, X +++ XK An-1: (uj, Uj) € b} 


is in Ly. But then so is 


Sein) SS" "Giese Oe) Xn 
by VI.9.8. 
Casen € {i, j} andi, j are consecutive integers. If n = 2, then we are back 
to the basis step, so assume n > 2. Now, observe that (u1,...,Un—1,Un) = 
((U1, +++, Un—2), Un—1, Un), and set 


BO ana, an, (ay PRE IE An—2)) 
a {(Un-1, Un, (U1, tees Un—2)) € (An—1 x an) 
X (ay X +++ X Gn—2): (Uj, Uj) € Db} 


Clearly, §° (dn—1, dn, (41 X+ ++ n—2)) € Ly by the previous case (and VI.9.8), 
and 


cel Cc eee {(u1,--+5 Un) €ayX--- X Gy: (Uj, Uj) € Db} 


= Bi (FO (Gn-1, Gn, (a) X +++ X n-2))) 


Therefore, the latter is in Ly by VI.9.10. 
Case n € {i, j} andi, j are not consecutive integers. By the first case, 


SM (A, ..., An) = {(u1,...5Uns Un—1) € 
a X +++ X Ay X An—1: (Uj, Uj) € DY 
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is in Ly. Since (uy,...,Un—1,Un) = ((U1,...,Un—2), Un—1, Un), We conclude 
that 


BO Gy ed ss Qn) = (i as iy Mata) € ay X +++ X dy: (uj, uj) € b} 


= 12(F™a,..., an) 


is in Ly. 
Case i = j, finally. By V1.9.5 and VI.9.8, 


FOG, a:) = {(ui, ui) € a; X aj: (Ui, ui) € dD} 


= §3(c, c) 


where c = aj Nb, is in Ly. Clearly, (u;, u;) € §3(c, c) iff uj; € dom(F3(c, c)). 
Thus, 


FO, «5 dn) = a X +++ X aj, X dom(F3(c, c)) XK aiz1 X +++ X An 


where, without loss of generality, we have assumed that is “in general position” 
(1 <i <2). The result follows from VI.9.9. 


We need one more lemma before we can successfully tackle separation in Ly. 


VI.9.12 Lemma. For each n > 1 and N-constructible sets a,,...,d, and for 
each formula 4 (un) of set theory, the set 


gia a 
B4(A1,..+54n) = {(u1,..., Un} © A, X +++ X Gn GO (Un)} 
is in Ly. 
Some of the arguments in. 4(u,) might be dummy ones, as in Auu2U3.U3 € Uy. 


The round brackets, according to our earlier conventions, mean that —- dummy 
or not — “i,” is the entire list of variables relevant to .4. © 


Proof. We do induction on formulas .4. The reader will want to keep in mind 
Definition VI.8.1. 


Case .4(u,) is Au,,.U(u;). Then, since U(x) is U(x), we obtain 


B.4(1,- ++, Qn) = (Uy... Un) € ay X +++ X ay U(uj)} 
= 4 X +++ X Gj_-1 X F2(Gi, Gi) X Gi41 X +++ X An 


in Ly, by VI.9.5 and VI.9.8. 


+ Of course, “x” associates left to right, and we omitted brackets to avoid cluttering the notation. 
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Case 4(un) is hy .u; = u; (possibly i = j). Then, since (x = y)bn isx=y, 
we obtain 


(Uj,...,Un) € Ay X +++ X Ay iu; = Uj} 
(Uj,...,Un) © Ay X +++ X Ay: (uj, Uj) € F3(aji X aj, aj)} 


in Ly, by VI.9.5, VI.9.8, and VI.9.11. 


Case Ain) is Attn.uj € uj; (possibly i = j). Then, since (x € y)'" is 
x € y, we obtain 


(Uj,..+,Un) © AY X +++ X Ay iu; E Uu;} 
(U1,.-.,Un) € A) X +++ X Ay: (Uj, Uj) € Fala; X aj, a;)} 


in Ly, by VL.9.5, VI.9.8, and VI.9.11. 
Case 4 (tn) is 7.7 (Un). By LH., 


B.A(Q,-.+5 An) = {(u,...,Un) €ayxX-+:+XQA : Bin} 
is in Ly. Since (7.Z2)"" is (2), 
BAA dy) = ay X ++ X dy — Bais + dn) 


and the result follows from VI.9.5 and VI.9.8. 
Case .4(tn) is. Bun) V & (Um), say, with m <n. By LH., 


Fe(ai,...1Qn) = (lui, ...,Un) € a1 X +++ X ang BG) 
and 


aL = 
Be(A1,.-.,4m) = {(U1,.-., Um) € ay X--: X dm ‘(Um)} 


are in Ly. Since (2 v 7 )/* is (2M) v (#49), 

B.4(q,.--5,dn) = FA(Q,..-,Qn) U FL, ..-, Am) X Ama, X +++ X Ay 
and the result follows from VI.9.5 and VI.9.8 (“x Gmi4 +++ X dy,” above is 
absent if n = m). 

Finally, 

Case .4(un) is (Ay).2 (un, y). By LH., 


Sala,. ’ -,4n, b) 
= {(u1,...,Un, y) € a1 X +++ X Ay XD: BY (ity, y)} 


() 
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is in Ly for any N-constructible a},..., a,, b. As we want to show that 


= {(u1,..., un) € a1 x +++ x dy: [Ay). Bin,» } 
= {(u1,...,Un) € a) X +++ X dy: Gy € Ly). Bin, y)} 


is in Ly, it suffices to prove there is an N-constructible set b for which (3) 
below holds. Then 


B.4(41,---, dn) = dom(§ g(a1,..-, dn, b)) (2) 


and we are done by VI.9.9. 
Now consider 


(V(iin) € ay x +++ X ay) @y € by € Ly ALAM Gn, y) 


ead oe (3) 
Vy =A 7G € Ly) B™ Gin, 2) 


We prove (3) as a consequence of 


(Win) € ay x ++ x an)(Ay)(y € Ly AL B™* Gin, y) 
; (4) 
Vy =9AGz € Ly).B™ Gn, 2)]) 
Let us prove (4). Let (un) € a, X +++ X dp. 
Case 1. (ay) (y € Ly A.B (in, y)). Then (4) follows by tautological 
implication and 4-monotonicity. 
Case 2. 7(Az)(z € Ly A.B (tg, z)). Note that @ is constructible: If N 4 
%, then 6= p—p,' where p € N (p is, of course, in Ly). If N=@, then 
0 = Fy. Thus Je Ly AD =G A 7(Az € Ly)" (in, Z) is provable; hence so 
is (ay)(y ely Ay=GA (az € Ly). A (in, z)) by the substitution axiom. 
Now (4) follows once more by tautological implication and 4-monotonicity. 
By collection,‘ there is a set A (new constant) such that 


(V(iin) € ay x +++ x ay) @y € Ay € Ly ALB Gin, y) 


Vy =9AWGz € Ly). B" Gin, 2)]) 


+ Recall that the formal difference makes sense on atoms. Cf. III.4.16. 
 “y € Ly” is, of course, the set theory formula “(da)y = F,,” — see Definition VL9.2. 
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Take then for b (new constant) any N-constructible set satisfying AN Ly Cb 
(by VI.9.7). We can now verify (2): 


(Un) € &4(a1,---54n) 
<> 
(in) € Ay X +++ X Ay A (Ay € Ly) B™ (itn, Y) 
o (+: by (4) (3); <: by (3) and VI.9.3, since b € Ly} 
(lin) € ay X +++ X Gy A Ay € b).B™ (ttn, Y) 
> 
@yy(y EDA (iin) © a1 X +++ X Gy A BEN in, v)) 
<S 
(in) € dom (F (a1,-.., dn, b)) 


VI.9.13 Theorem. J = (Lset, ZF, Ly) is a formal model of ZFC. 


N.B. Actually, Ls. and ZF contain N and f and their axiom (VI.9.2). 


Proof. The proof is in ZF. First off, Ly 4 @. Indeed, we have shown in the 
course of the previous proof that 4 € Ly. We verify now the ZFC axioms. 


(1) Extensionality holds in Ly by VI.9.4 and VI.8.10. 


(2) For the axiom 


U(b) > =(Ax)x € b 
we want 


b ely > U(b) > 7~Gx € Ly)x €b 


which follows from the ZF version preceding it. 
(3) The axiom of separation says that 
A(Uj-1, @, Ui41,.-.5Un) = {uj uj; € AAP (Un)} 
is a set (parametrized by the free variables a andu;, 1 < j <iVi< j <n) 
for any formula Y. The relativized version asserts that 
AM (ij-1, 4, Uit1,-- +5 Un) = {Uj € Ly tu; EAAP™™ (U,)} (i) 


is constructible from N whenever a andu;, 1 < j <iVi < j <n, are 
(cf. VI.8.13). 

So, we let a andu;, 1 < j < ivi < j <n, be in Ly and prove that 
A™" (Hj, 4, Uj41,---,Un) € Ly. By transitivity of Ly, (i) simplifies to 


A™’ ij_1, 4, Ui1,--+,Un) = {ui iu; €CaAP™* (y)} i’) 
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Now, by VI.9.5, VI.9.12, and VI.9.8, 


BP({Ur},---5 {ui-1}, a, (Uigi},.-.5 {Un}) 


= {(Kn) © {ui} x +++ x {ia} x a x {ui} x +++ x {un}: P™ Kn} 
is constructible. Hence 


random.--- dom(¥7) ifi>1 
_—_, ——— 
n—i terms 


AM (ij-1, @, Ui41,..-,Un) = 
eae sapere oa dom: -- dom(¥7) otherwise 
eS -—<——S 


n—1 terms 


which is in Ly, considering that ran(x) = dom(§j0(*)). 


(4) Existence of the set of urelements: {x : U(x)} is a set. We want to prove 
(cf. VI.8.13) 


{x € Ly: U(x)} € Ly 


In view of the already noted sp(Fy) C N and N C Ly (N = { Fy: a@ < ||N ||}, 
the above translates to N € Ly. Now, since N C Ly (and N is a set), there 
is, by VI.9.7, an N-constructible set A such that N C A. But then N = 
{x € A: U(x)}; hence it is constructible by Ly-separation ((3) above). 

(5) The pairing axiom: For any atoms or sets a and J, there is a set c such 
that a € c and b € c. Thus, we want (by VI.8.13) to show that {a, b} is defined 
in Ly. Since {a, byw = {a, b}, this follows from VI.9.5. 

(6) The union axiom states, essentially, that if A is a set, then ) A is a set. 
For the Ly version we need |) to be defined in Ly (again by VI.8.13). Since 
(by VIL8.16 and VI.9.4) LJ is absolute for Ly, we need to show that Ly is 
J-closed. For A € Ly, UNA = {x: Gy € A)x € y} © Ly by VI.9.3. Hence 
UA Cb € Ly for some b (by VI.9.7), and LJ A € Ly by Ly-separation. 

(7) Foundation holds in Ly by VI.8.11. 


(8) Collection says that for any set A and formula 7[x, y], 
(Wx € A)(Ay)A Ix, y] > Gz)(vx € A)(Ay € AIX, y] 
Letting A € Ly, we want to prove the relativized version 


(Vx € A)Gy € LyyA"™[, y] 


—> (az € Ly)(v¥x € A)Ay € 2A [x, y] Gt) 


So assume the hypothesis of (iii), Le., 


(vx € A)@y)[Ge)y = AP x, y]] 
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By collection (in ZF), there is a set w (new constant) such that 
(Wx € A)(Ay € w)|[(A@)y = Fy APN [x, yl] (iv) 
By V1.9.7 there is a set s € Ly (new constant) such that wM Ly C s. Let 
z= {yes:@reAa7™[x, yi] (v) 


By Ly-separation, z € Ly. Moreover, this z works to establish the conclusion 
of (iii). Indeed, let x € A. By (iv) we derive 
(ay)Ga)[y ewrA y= FyA Px, yl] 


since without loss of generality @ is not free in Y. Introduce now new constants 
Y and £ and the assumption 


YewAY=FpAF (x, Y] 
The first two conjuncts imply that Y € s hence, by (v), Y € z. Thus, 
YezaPY [x, Y] 


hence (4 y)( yezaP'[x, yl) from which generalization and the underlined 
assumption (deduction theorem used) yield 


(Vx € A\(Ay € 2A Ix, y] 


The right hand side of (iii) is now obtained by the substitution axiom and modus 
ponens. 

(9) The power set axiom says that for any set A, {x :x C A} is a set too. For 
the Ly version we want P(A) to be defined in Ly. Now P“’ (A) = P(A) N Ly 
(Exercise VI.60), a set by the power set axiom and separation in ZF. Since 
P(A) Ly € Ly, P’*(A) is constructible by VI.9.7 and Ly-separation. 


(10) The relativization of the axiom of infinity, essentially, says wo” € Ly. 
Thus, if we can prove that w € Ly, then we will be done, since wo” =o will 
follow (by remarks in VI.8.18). We will prove a bit more for convenience, 


namely, 
On C Ly 
So take as induction hypothesis that 
(VB <a@)B € Ly 


Proceeding now exactly as in the proof of VI.9.7, there is a limit ordinal y such 
thata C F,.LetA = {o:0 € Fy}andt co € A. First,o = F, forp < y, 
by VI.9.3. Hence, tT € o C F,, since Lim(y) (see VI.9.2). Thus, A is transitive; 
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hence A € On. Furthermore, A € Ly by Ly-separation.' Clearly, a C A. If 
a =A, thena € Ly. Ifa € A, then againa € Ly by transitivity of Ly. 


(11) The axiom of choice, relativized in Ly, says that if A is a (constructible) 
set of nonempty constructible sets, then there is a choice function c with 
dom(c) = A such that (Vx € A)c(x) € x. To prove this just take 


c(x) = Frin(a : F,€x) 


VI.9.14 Corollary. [f ZF is consistent, then so is ZFC. 


Proof. By our previous construction and the results of Section I.7. Note that the 
auxiliary constant metatheorem is being invoked, since neither the hypothesis 
nor the conclusion refers to the new constants N and f introduced as per the 
footnote on p. 396. c.f. p. 75. 


Gédel also showed that the generalized continuum hypothesis is true in Ly, 
and hence consistent with ZF and AC (see VII.7.25). 

We conclude the section by briefly exploring some easy consequences of 
absoluteness considerations, mostly stated as exercises. 


VI.9.15 Exercise. Prove that all Gédel operations, including the three derived 
ones, are absolute for transitive models of ZF. Are they absolute for any other 
classes? 


VI.9.16 Exercise. Prove that the function Aa. Fy, is absolute for Ly when the 
constants NV, f are interpreted as themselves. 

Hint. This follows from VI.9.15 and techniques in Section VI.8 (cf. in par- 
ticular VI.8.20 and VI.8.26). Note that f: dom(f) > WN is in Ly. This is so 
by dom(f) € On C Ly, f € dom(f) x N, and closure of Ly under x (now 
apply Ly-separation). 


The axiom of constructibility says that all objects are in Ly, or all objects 
are constructible.! Formally then it is 


(Vx)(da)x = Fy (V=L) 


It denoted by “V = L” or “V = L” depending on typeface preference. 


tj A={o:0€ Fy} = {x € F,:x is an ordinal}. See VL8.16. 
t Tn atomless approaches to ZF, an object has to be a set. 
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It must be noted however that the formula displayed generically as (V = L) 
depends on the choice of N used in the construction of the functiona h Fy, 
and hence of Ly. What N we have in mind in a particular argument should 
be clear from the context, if the particular choice affects results claimed. Set 
theorists do not believe that the axiom of constructibility is (really) true, but 
they find it interesting as a “temporary” axiom. For one thing, it is harmless: 


VI.9.17 Theorem. Jf ZF is consistent, then so is ZF + (V = L). Indeed, V = L 
is true in J = (Lget, ZF, Ly), where ZF is extended as in VI.9.2. 


Proof, (V = Ly)" is (Wx € Ly)(aa € Ly)(x = Fy)"; hence, by VI.9.16 and 
On C Ly, one needs to prove 


(Wx € Ly)(da@)x = Fy 


This is precisely VI.9.2. 


For another thing, it simplifies the relativization of arguments to Ly: Suppose 
that we want to prove (having set N'Y = N and f'” = f) 


bop 6h" (1) 
for some sentence . 4 of the extended Ler. Suppose that we prove 
F2R4(V=L) 4 (2) 
instead, which results in a proof of (by the deduction theorem) 
ty V=L>.4 (3) 


Since J is a formal model of ZF (the N, f-axiom is true in 3), (3) implies 
(cf. 1.7.9) 


E;(V=L)>.4 


But EF; (V = L) by VI.9.17; thus EF; . 4; that is, we have derived (1). 


Conversely, suppose we have proved (1). We show that (2) follows. To this 
end we show by induction on formulas that 


V=LtE.40.4% (4) 


We just check the interesting case where .4= (Ax).7. Now. 44" = (Ax) ((ae) 
x=, A BN), while 


V=Lt @x)(Ga)x = Fy A.B") & (Ax)(Ga)x =F AZ) (5) 
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by the LH. But V = LF ((da)x = Fy A #B) < .%, since the hypothesis — 
(Vx)(a)x = F, —implies Ga)x = F,. This and (5) yield (4) via the Leibniz 
rule. 

Thus, if we have (1), then we also have zp + y=1) én , and hence (2), 
by (4). 


The moral, in plain English, is: 


VI.9.18 Remark. To prove in ZF a sentence .4 relativized to Ly, add the 
axiom of constructibility and prove instead the unrelativized sentence . 4. 


VI.9.19 Exercise. Prove that Ly is the C-smallest proper class (formal) model 
of ZF among those that contain N. In particular, L (cf. VI.9.2) is the C-smallest 
proper class (formal) model of ZF. 

Hint. Indeed, let M be a proper class model of ZF where N € M. First show 
that On C M: Let a € On. Look for a 6 € M such that a < f. For example, 
as M is not a set, pick anx € M— NU Vy(q@). By VI.8.25, p(x) € M. This 
is a good enough f. Conclude by computing ie. Towards this you will need 
that Aw. Fy is absolute for M, in particular, that f of VI.9.2 is in M. The latter 
is argued as in VI.9.16, using the absoluteness of x (cf. VI.8.16). 


VI.9.20 Remark. Our discussion has focused so far on “large” models M, i.e., 
proper class models. These have the advantage of containing all the ordinals. 
One is also interested in “small” transitive (U, €)-models of ZF, M, where M isa 
set. We can easily adjust the constant N of p. 396 so that the related assumption is 


AU(N) A (Vz € NNU(z) A f isa 1-1 function A dom(f) C @ A ran(f) = N 


Since M satisfies infinity, the constant w is absolute for M. The additional 
assumption that N ¢€ M will then yield that Aa.F, is absolute for M 
(cf. VI.8.19). 


VI.10. Arithmetic on the Ordinals 


Infinite sequences extend the notion of an n-tuple, while transfinite sequences 
extend the notion of an infinite sequence. Just as in the case of finite sequences 
(V.1.28—V.1.29), where we can juxtapose or concatenate them to obtain a se- 
quence whose length is the sum of the lengths of the originals (V.1.30), we are 
able to do that with arbitrarily “long” WO sets, in particular with canonical WO 
sets, i.e., ordinals. 


oe 
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For example, on concatenating a and £ in that order (a « 6B) we expect, 
informally speaking, to end up with the sequence 


10/152) nce ate yh (1) 


of length, intuitively, a + 6, as is the case when q, 6 are natural numbers. We 
would also like to be able to iterate concatenation, for example concatenating 
a with itself “6 times”, to obtain 


KOK KO (2) 
en 
a sequence of 6 copies of a 
of length, intuitively, a - B. 
Let us first make addition of ordinals precise by extending the recursive 
definition of addition over w (V.1.22).7 


VI.10.1 Definition (Addition of Ordinals). The unique function A(q, B) given 
by the recursion below will be denoted by “a + BB”: 
A(a,0)=a 
A(a@, B + 1) = A@, B) +1 
for Lim(B), A(a, 8) = sup{A(a, y):y < B} 


In the definition we use “+” with potentially two different meanings: The new 
meaning is given in the definition. The old meaning is in the use of “+1” to 
mean ordinal successor. It will turn out that the two meanings are consistent. © 


VI.10.2 Proposition. 48.0 + 6 is normal for any a. 


Proof. By V1.5.38 and the weak continuity of 18. + 8, we only need to show 
thata + B <a+(f6 + 1). But, by VI.10.1, this translates to 


a+B<(a+ 6)+1 
= (a+ B)U {a + B} 


VI.10.3 Corollary. Jf Lim(§), then Lim(a@ + f). 


Proof. By V1.5.40. 


We next prove the analogue of V.1.25, which shows that Definition VI.10.1 
indeed captures the intuitive meaning of (1) in the preamble to this section. 


+ All the results in this section are due to Cantor. 
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VI.10.4 Theorem. a + B=aU{a+y:y < B}. 


Proof. We do induction on £. The basis and the case 6 + | are handled exactly 
as in V.1.25. 


So, let Lim(8), and assume (I.H.) that whenever y < f, 
aty=aU{at+A:ara<y} (1) 


Now 6 > 0; hence a <a + 6, using VI.10.2 (since a + 0=a by VI.10.1); thus 
a Ca+f8. Moreover, by VI.10.2 again,a+y <a+ ,thatis.at+ty €a+f. 
Thus 


a+ B2aU {a+ p:p < B} (2) 


Let now 6 €a+ 6 = U{a+t:t < B}. Thus,d €a+7 forsomet < B, 
and, by (1), 6 € a ord = a +A for some A < Tt. Thus 6 is in the right hand 
side of (2), and we get the converse inclusion of (2). 


VI.10.5 Remark (On Notation). (1) We re-examine the notation aw + 1, which 
we have adopted to mean a U {a}. Using the theorem above and thinking of 
“4” here as addition rather than successor, we get 


a+l=aU{at+y:y <1} 
=aU {a+ 0} 
=a U {a} by VI.10.1 


Thus the new notation a + 6 and the old a + 1 mean the same thing when 
B=1. 

(2) Often, the notation + is used instead of + for ordinal addition, + being 
reserved for cardinal addition. For the addition of cardinals we will use +,. We 
use + for something else (see below). © 


We next relate the ordinal of a WO set obtained by concatenation of two 
WO sets to those of the originals. As intuition dictates, it will be the sum of the 
two original ordinals. Thus the length of the concatenation of a WO set is the 
sum of the lengths of its two components, as it should be. 


VI.10.6 Definition. 


(i) The disjoint union of sets X and Y is the set {O} x X U {1} x Y, denoted 
X+Y. 

(ii) If (Xg)geq 18 an a-sequence of sets, then their ordered disjoint sum is 
Usea({B} x Xg) and is denoted by ae Xp. 
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Clearly, Xo + X1 = Yo per Xp. Note that X + Y #Y + X in general (e.g., take 
X = {O} and Y = {1},) © 


VI1.10.7 Definition. 


(i) Let (X, <,) and (Y, <2) be two WO sets. Then <, on X + Y is defined 
lexicographically, 1.e., 


(i,x) <,(j,y) iff i<jv 
i=j=O0Ax<,;yv 
i=j=lAx<2.y 


(ii) Let (Xg, <g) be a WO set for all 8 € a. Then <y on Maes Xx is defined 
lexicographically by 


(B,x) <x (y,y) iff B<yv 
B=VAX <py 


VI.10.8 Proposition. <,. and <y of VI.10.7 are indeed well-orderings. 


Proof. That they are linear orders is straightforward. Moreover, in either case, 
any nonempty set of pairs (6,x) has a minimum: Locate the ones with 
<-minimum f among such pairs; in this set locate the pair with <g-minimum 
x (in Xp). 


VI.10.9 Definition (Concatenation of WO Sets). Given WO sets (X4, <,) for 
all B € a, their concatenation catgeg(Xg, <p) is the WO set nen Xp, <x). 


In particular, when a = 2 we write also (Xo, <o) * (X1, <1) for the concate- 
nation, and we denote <x by <,, in this case. 


The following theorem shows that the definition of addition of ordinals is 
appropriate. 


VI.10.10 Theorem. /f ||(Xo, <o)|| =a and ||(X1, <1)|| = 8, then ||(Xo, <o) * 
(X1, <)l|l =a + B. 


Proof. Let 


a= Xo () 
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and 
g 
p=X (2) 


where we have omitted the relevant orders to the left (€) and right (<;, i = 0, 1) 
of = for simplicity of notation. 


Let f = {(y, (0, F”))):¥ € @} and F = {w+ y, (1, g())) sv € B}- 
Since a < a+ y (with the “=” when y = 0), we have dom(f) M dom(g) = 0; 
hence H = f UZ isa total function on dom(/) U dom(8), that is, on w + f, 
by VI.10.4. H is onto Xo + X, since f and g are onto {0} x Xo and {1} x X 
respectively, by (1) and (2). 


By (1) and (2), and Definition VI.10.7, H is order-preserving and hence an 
isomorphism between a + f and (Xo + X1, <:). 


VI.10.11 Proposition (Some Properties of Ordinal Addition). 


G)a+(Bry)=a+f)+y 
Gijja<Broat+y<fBbt+y 
Giijja<Boyta<y+ 8 
(iv) O+a=a 
(vya<BPoQdy>Oat+y=B. 


Proof. (i): We do induction on y. Assume then the claim for all A <y. 


By VI.10.4, 


(a+ B)+y =(@+B)U{at+ B)+Aia< y} 
=(a+ P)Ul{at+(BHaA):A<y} by LH. 
=aU{at+é6:6 < BJUf{a+(B+A):A < vy} by VI.104 (1) 


Next, 
at+(Bt+ty)=aU{a+s6:5<B+y} (2) 
Since B+ y = BU{B+A:). < y},6 < 6+ y is equivalent to (recall that < 
is €) 
6<BV(AA<y)b=B+A 
Thus, (1) and (2) yield the claim. 


(ii): We do induction on y. The basis follows from the assumption and 
6+0=6 forall d. 
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The casea+(y +1) vs.6+(y + 1) follows froma+y < 6+ y (.H.) 
and VI.5.5. 


Let next Lim(y), and assume thata +2 < 6B +A forall a < y (LHL). 
Then 


a+ty=supfa+dA:rA <y}<sup{6+A:A<y}=Bt+y 


(iii): From VI.10.2 and trichotomy. 
(iv): Assume the claim for all 6 < q@ (1.H.). By VI.10.4, 


O+a=O0U{0+ 6:6 <a} ={B:8 <as=a 


(v): The <— follows from (iii) and VI.10.1. For , leta < 6,hencea C f. 
Thus the set difference X = B — a is nonempty and well-ordered by <. Let 


IX, l= y (3) 
Hence y > 0 (X #9). The function f given by 


_ {(0,) ifdea 
F0)= {5 ifsex 


is trivially an isomorphism 


px (0) x aU{l} x X 


Hence ||(a + X, <,)|| = 8. But also ||(@ + X, <,)|| = a+ y by (3), whence 
the claim. 


ene Example. Note that + is not commutative on On. For example, 
l+o= {0}U{l+n:neo} 
= {O}U{n+1:n eo} by V.1.24 
= {n:nea} 
=o 
Yet,wa<ot+l. 


Here is an alternative argument: w + | is a successor, while Lim(1 + @) 
by VI.5.40, so they cannot be equal. © 


We next define multiplication of ordinals by iterating addition, as it is done 
over w. 
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VI.10.13 Definition (Multiplication of Ordinals). The unique function 
M (a, 6) defined by the recursion below will be denoted by “a - 6”: 


M(a, 0) = 0 
M(a, 6B +1) = M(a, B) +a 
for Lim(8), M(a, 8) = sup{M(a, y):y < B} 


VI.10.14 Proposition. 48.a - 6B is normal for any a > 0. 


Proof. By V1.5.38 and the weak continuity of AB.a@ - 6B, we only need to show 
thata-B <a-(6+ 1)ifa > 0. But, by VI.10.13, this translates to 


a-B<(a-B)+a 


which is derivable by VI.10.2. 


VI.10.15 Corollary. /f « > 0 and Lim(), then Lim(a - ). 


Proof. By V1.5.40. 


VI.10.16 Theorem. a - 6 = sup{a-y +a:y < B}. 


Proof. We do induction on £. Let first 8 = 0. Then a -0 = 0 by VI.10.13. Also 
sup{a-y +a:y € B} = sup¥ = YJ. Done with the basis. 


Assume next as I.H. 


a-B=sup{fa-y+a:y < B} 
U@-y +a) (1) 


yep 


Then, sincea- 8B Ca-6+a by VI.10.2 (=” corresponds to a = 0), 


a-(B+l)=a-B+a 
= (a+ B+a)U(a- B) 
=(a-B+a)U|J@-y+a)_ by (1) 
vep 


U @yt+a) 


yept+l 


This settles the successor case. 
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Finally, let Lim(6). Now, by VI.10.13, 
a B=|Ja-y 

vep 

cUJ@-y+e) sincea-y Ca-y+a 
yep 

= U a-(y+1) by VI.10.13 
vep 

CUa-y sinceyE€Bry+1eEs 
vep 

The I.H. was not needed in this case. 


We next interpret multiplication of ordinals in terms of concatenations of 
WO sets. 


VI.10.17 Theorem. Let (X,,<,) be a WO set with order type a, for each 
y € B. Then |\caty <p(X,, <y)|| =a- B. 


Proof. The case a = 0 being trivial (each X, =), we assume a > 0. By 
Exercise VI.35, it suffices to assume that X, =a forall y € B. 

For ease of notation, set Ag = LW, 2 ma y}xa). Take the induction hypothesis 
(on f) that 


(Vy < B)I(Ay, <x) =a-y () 


Now, since (Ag, <x) is a WO set, there is a unique isomorphism ¢,, that maps 
this WO set onto an ordinal. This ¢ is given by (see VI.4.32) 


ba, (v, 5)) = {hago T)) (0, T) <a (v, 4)} 


= $4,{<x((v.9)}| 
for any (y, 6) € Ag. Using (1), we now compute the ordinal in (2). 


(2) 


For y < 6 we have 


<x ((v,8)) = ((LJn} x &) U fy} x8 
n<y 


= {0} x (Ldn) x a)) U {1} x 6 by Exercise VI.36 
= {0} x A,U {1} x 6 
where we have omitted the orders on both sides of the =. Thus, by the IH. (1), 
bay((v; 8) = b4,| <x((v,9))| 
=a-y+6 


(3) 


We are ready to compute ||(Ag, <x). 
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For y < Bandéd <a,wehavea-y+d<a-y+a=a-(y+1) by VI.10.2 
and VI.10.13. Thus, a-y +6 <a- B by VI.10.14. We conclude that 


pa,lAg] Ga-B (4) 


Towards promoting (4) to equality, let € a-B = Ufa-y+a:y < B} 
by VI.10.16. Thus, 7 € a-y + a@ for some y < #; therefore 7 € a-y 
or n=a-y+6 for some 6 <a, by VI.10.4. In the latter case, immediately 
n € a,[Ag] by (3). In the former case,a-y =a-y +0 € ¢a,[Ag] and 
transitivity of ¢4,[Ag] again yield n € ¢4,[Ag]. Thus (4) is an equality. 


VI.10.18 Remark. It is worth noting that Ag = U, <p ({v} xa) = B xa; thus 
VI.10.17 shows that 


(B X a, <y) = (a: B, <) 


Note the order reversal of the “terms” a and fp. 


VI.10.19 Proposition (Some Properties of Ordinal Multiplication). 


(i) O0-a=a-0=0 

(ii) l-a=a-l=a 
(iii)a-2=at+a 

(iv) a<Boa-y<p-y 
(v)a>O>(a-B<a-yoB<y) 
(vi) a-(B+y)=a-Bta-y 

(vii) a-(B-y)=(@-B)-y 
(viii)a>1lAB>1loat+B<K<a-Bs 


(ix) a f04B>a-B4£0. 


Proof. (i):0-a = Useg(0-B +0) = 0, using the I.H. thatO-6 = Ofor B < a. 
a-0=Oby VI.10.13. 


(ii): 
l-a=(Jd-6+) 
= Ue +1) under the obvious LH. 
= pt peal 
=a 


On the other hand, a-1=a-0+a@ =a (VI.10.13 and VI.10.11(v)). 
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(iii): 
a-2=|J@-pt+a) 


B<2 
=(a-0+a)U(a-1+a) 
=aU{a+a} 
=a+a, sinceaw Ca+a by VI.10.2. 


(iv): Fix a < B. We do induction on y: Let y = 0. Thena-0 = 6-0. Done. 
Next, consider y = 4+ 1. Then 
a-(A+1l)=a-A+a 
=<fB-A+a_ byILH. and VL10.11(7) 


<B.A+6 by VI.10.2 
= p41) 
Finally, let Lim(y), and assume (I.H.) thata-A < 6-d forall a < y. Thus, 
a-y=Uj,-0: ACU, BA=B-y. 
(v): The < is by VI.10.14. The — is by VI.10.14 and trichotomy. 


(vi): The result holds for a = 0 (by ()), so let a > 0 and do induction on 
y. The casey = Oisa-(6+0)=a-B=a-8+0=a-6+a-0, using (i) 


for the last “=”. Let y = 4 + 1, and assume the obvious I.H. Then 
a-(B+A+))=a-((B+A)+1) 
=a-(Brayta 


=(a-B+a-A)+a_ byLH. 
a-B+(a-A+a) 
=a-B+a-(A+]) 


Finally, let Lim(y), and assume the claim (I.H.) for all A < y. Now, 


a-(B+y)=a-(B+sup{r:d < y}) 
=a-(sup{B+A:A<y})_ by continuity of + (VI.5.36, VI.10.2) 
= sup{a-(B+A):A <y} by continuity of - (VI.5.36, VI.10.14) 


(recall that we are in the case a > 0) 


=sup{fa-Bta-A:A<y} bylLH. 
=a-B+sup{a-4:A <y} by continuity of + 
=a-Bta-y_ by VI.10.13 


(vii): We do induction on y (assume throughout thata 4 0 ¥ 8, since 
otherwise the result reads 0 = 0). When y = 0, we geta- (6-0) =a-0=0. 
Also, (a - 8) -0 = 0. Done. 
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Let y =A+1.Thena-(6-(A+1))=a-(B-A+B)=a-(P-A)t+a-B= 
(a-B)-A+a-B =(a- B)- (A+ 1), where the second “=” uses distributivity, 
and the third one the obvious I.H. 


Let now Lim(y). Then, arguing as in (vi), using continuity of - , 


a-(B-y)=a-(B-sup{r:a < y}) 
a-sup{B-A:2 < y} 
sup{a-(B-A):A < y} 

= sup{(a- B)-A:A <y} by LH. 
=(a-B)-y_ by VL10.13. 


(viii): By Exercise VI.33, 


atp=|Jlot+i+l:y <B} (4) 
under the assumptions. On the other hand, 
a: B=|Jla-(t+):y <B} (5) 


We thus take as I.H. that | <a and 1<y yielda+y<a-y. Therefore, (4) 
and (5) yield the result for 6 > 1, by the LH. and the strict monotonicity of 
Ay.a+(y +1) andAy.a-(y + 1). 

(ix):a- p= LJ < pla -y+a)Daby VI.10.11(é7) and the fact that the union 
is nonempty. We have proved more under the assumptions: 0< a <a- B. 


VI.10.20 Remark. Multiplication of ordinals is not commutative. For example, 
w:-2= 0+ > o. On the other hand, 2-@ = J{2-n:n € o}, since Lim(a). 
Now {2-:n € w} C a; hence 
[J(2-n:neo} Co 
The C above graduates to =, since the fact that Lim(2 - w) precludes C. Thus, 
@:-2>2-0. 
There are two more conclusions to be drawn from this example: 

(1) The “right” distributive law does not hold. For example, (1 + 1)-@ = 2-0 = 


ao<wtow=o-2. 
(2) VI.10.19(iv) cannot be sharpened. Indeed, | < 2, yet 1-w@=2-a. © 


VI.10.21 Proposition (Division with Remainder). Given a and B > 0, there 
are unique ordinals m and v such that 


(A)a=6-r+0, 
(2) tm <aandv < B. 


x is the quotient and v is the remainder of the division of a by B. 


VI.10. Arithmetic on the Ordinals 421 
@ This extends the well-known theorem of Euclid from @ to all of On. © 
Proof. Existence. If a =0, then take m =v = 0. So let a>0. Since B-0= 
0 <a, the set X = {9: 8-0 <a} is not empty (why ser?). If @ > a, then 


B-0>1-86 by VI.10.19 
=06 by VI.10.19 


>a 


Hence 6 ¢ X. Contrapositively, 9 © X > 6 < a. Thus sup X < a. We claim 
that 


wz = sup X (1) 


works. So far (7 < a), so good. We want to propose a partnering v. 
Now, using VI.5.36 and VI.10.19, 


B-m = f-supX 
sup{B- 6:0 € Xx} (2) 


oe 
By VI.10.11(v) and (2), there is an v such that 
a=B-m+v (3) 
with 
v=0 iff (2) is equality (4) 
We next show that uv < £. If not, 


Case B =v. Then (3) yields (by VI.10.13) that a= 6 - (7 +1); hence 
xz+1€ X, contradicting (1). 

Case B < v. Then (VI.10.11(v)) there is a y > 0 such that v= 6+ y, and 
(3) yields 


a=B-n+(B+y) 
=p (r+l)+y 


again yielding z + 1 € X by VI.10.11(v), thus contradicting (1). We have 
settled existence. 


Uniqueness. Let 
a=p-ntv=B-n'+u' (5) 


where v < Bandv’' < B. 


422 VI. Order 
Let a < x’. Thenz’ =2+~y for some y > 0; hence 
a=B-(r+y)+v =Bp-1r+(B-y+v) 


Now 


Hence B- a7 +u<6-27+(B-y+vu’)=8-z' +’, contradicting (5). 


Similarly, z > z'is untenable; hencez = 2’. Thusv = vu’ by VI.10.11 (iii). 


VI.10.22 Corollary. For a > 0, Lim(a) iff (A4B)w-B =a iff2-a=a. 


Proof. Let Lim(q@), and divide a by to get (by VI.10.21) 
a=w-m+vu 


Now, uv < @; hence it is a natural number. If v > 0, then v =n+ 1 (n € o); 
hencea = w-17+(n+1)=(@m-2 +n)+ 1, a successor. This contradicts the 
assumption, so v = 0. 

Next, say0 <a =o-f.Then2-a=2-(@-8)=(2-o0)-B=o-B=a, 
where we have used VI.10.20 in the penultimate “=”. 


Finally, let 0 < @ = 2-a, and suppose that a = 6 + | for some f. Then 


R 


=2-(6+1) 
=2-64+2 

>1-B+2 by VL10.19 
=B+2 

>pBpt+l=a 


a contradiction. So Lim(q@). 


We conclude this section with the operation of exponentiation on ordinals, 
once again using the corresponding operation on the natural numbers for mo- 
tivation. Therefore we will iterate multiplication. 
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VI.10.23 Definition (Exponentiation of Ordinals). The unique function 
P(a, B) defined by the recursion below will be denoted by a’: 


P(a,0)= 1 
P(a,B+1)= Pa, B)-a 
for Lim(B), P(a, 8) = sup{P(a, y):y < B} 


The “-” in a? is an annoying convention used to distinguish ordinal exponen- 
tiation from cardinal exponentiation, the latter being (usually) privileged not to 
need the “-”. 


So far we have kept the notation for ordinal operations “natural”, by using 
the same symbolism as that used on the natural numbers. As justification we 
invoke priority: In our development we first developed the natural numbers, and 
we denoted n U {n} by n + | (as most people justifiably do), which induced (or 
forced) our subsequent notation, + and -, first on the natural numbers and then 
on ordinals. We decided that we did not want to ask the reader to write m+n 
from some point onwards, when he means to add m and n. Note, however, that 
some authors use + and e respectively, although they may start off with the 
notation n + | forn U {n}. 


It is clear that the above recursion also defines exponentiation over w, pro- 
vided the case Lim(q) is dropped, as it is irrelevant over w. 


VI.10.24 Remark. If a > 0, thena’? > 0. Indeed, w® = 1 > 0. Furthermore, 
assuming that a’? > 0 (I.H.), we get that 


aFtlig? .~>0 


by VI.10.19(ix). Finally, if Lim(8) and aw” > 0 for all y < 6 (.H.), then 


a? = supfa” :y < p} >0 


VI.10.25 Proposition. If a > 0, then XB.a'? is weakly normal. If a > 1, then 
it is normal. 


Proof. Since a? > 0 by VI.10.24, we have a+! = w8 -a > aw? by VL10.19, 
with “>” if a > 1. The result follows from VI.5.38 and VI.5.39. 


VI.10.26 Corollary. [fa > 1 and Lim(), then Lim(a’®). 


© 


© 
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VI.10.27 Proposition (Some Properties of Ordinal Exponentiation). 


(i) 0° = 0 iffa is a successor; otherwise 0% = 1 
(ii) 1*=1 
(@ii)a>lo ah <a’ oB<y) 
(iv) a >O0> aft” =a .a? 
(vl) a> 0 > (aw f)’ = ah’ 
(vi)a>1AB>0>1 <a 
(vii)a>1AB>1loa-B <a’. 


Proof. (i): 08+! = 08-0 = 0 by VL10.19. By VL.10.23, 0° = 1. Now if 
Lim(B), then 0% = sup{0”:y < B} > 1, since 0 < B. Assume next (I.H.) 
that whenever y < f and Lim(y), then 0” = 1. It follows that 0? = sup{0” : 
y<A}<l. 

For (ii), 1° = 1,and 1¢@+! =1¢%-1=1-1=1 using the obvious I.H. 
Finally, let Lim(@w) and 1° = 1 for all B < a (LH.). Then 1% = sup{1 : 
B <a} =sup{l} = 1. 

(iii): By VI.10.25 and trichotomy. 

(iv): We do induction on y. First, «?t° = a? =a?.1 = a8 - a. This 
settles the basis. Next, 


og BtOtD — gq B+y)41 
=a *’.q@_ by VI.10.23 
(a? -a@”)-a@ by the obvious LH. 
=a’ .(a”-a) 
=a’.q’t! by VL.10.23 


Finally, let Lim(y), and assume the claim for all 6 < y. By VI.5.35 and 
VI.10.25, AB.«’? is continuous. Therefore, 


by = oe Bt supls :3<y} 
= gy Sup{B+d :d<y} 
= sup{faPt?:8 < y} 
sup{a? -w?:5 < y} by the LH. 
=a? -sup{a’:5 < y} by continuity of - and by a? > 0 
=q?.q’ 


(v): A similar routine calculation proves this case. 
(vi): 1 = a. Now use (iii). 


(vii): Going for a contradiction, let 6 > 1 be smallest for which (vii) fails, 
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1.€., 
aP<a-B (1) 


Is it the case that Lim(f)? If so, a - B= sup{a-6:6 < B} and Lim(a - B) 
(cf. VI.10.15); thus, by (1), there is ad < 6 such that 


aP <a-d (2) 


By (2), 
6>1 
since otherwise a’? <a =a'!, contradicting (iii) (recall, w > 1, B > 1). Now, 
by (iii), 
ar <a 
<a-6, by (2) 
This contradicts the minimality of 6, so we must have 6 = y + | for some y. 
Clearly, y > 1; otherwise y = 1 (why?) and (1) says that 


~<a-2=a+a 


a-a=a 
which cannot be, by VI.10.19(viii). By minimality of 6, 
a-y<a’ 
Hence 
(w-y)-a<a%-a=a" (3) 
Since B=y+1<y+a<y-aby VL10.19(viii), (3) yields 


a-B<a-(y-a)=(a-y)-a <a? 


contradicting (1). 


Vs Example. (a - 6)” #4 a” - 6” in general. For example, 


while 
DOF? SDE (2) 


We see that the right hand sides of (1) and (2) are different by VI.10.27(iii); 
therefore so are the left hand sides. © 
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VI.10.29 Remark. Since Aaw.w® is normal, it has arbitrarily large fixed points. 
Such fixed points were called “e-numbers” by Cantor. 


In particular, the reader can readily verify that sup{w, w®, w®”,...} is an 


€-number, the smallest one above w. 


Additional material on the arithmetic of ordinals can be found in the Ex- 
ercises section and in the references Bachmann (1955), Kamke (1950), Levy 
(1979), Monk (1969), Sierpinski (1965), Tarski (1956). 
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VI.4. 


VL5. 
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VI7. 


VI.8. 
VI.9. 


VI.10. 


VI.11. 


VI.12. 
VI.13. 


VI.11. Exercises 


If P has MC and Q C P, then Q has MC. 


If P is left-narrow, then P” (a) is a set for all > 0 in @ and alla. 


Prove that if an order < is left-narrow, then it has MC iff every nonempty 
set has a <-minimal element. 


(Hint. Only the if part is non-trivial. Start with a € A. If a is minimal, 
fine. Else show that any minimal element in the set < (a) NA is minimal 
in A.) 


Prove that if a relation P is left-narrow, then it has MC iff every nonempty 
set has a P-minimal element. 

(Hint. Work with P*.) 

Prove the claims in the @—@ passage in Remark VI.2.26. 


Prove that if A C B and A is a transitive set, then C(B, x) = x for all 
x € A, where C is Mostowski’s collapsing function (VI.2.38). 
(Hint. Use €-induction.) 


With reference to VI.4.3, prove that Id 1}, <)| ~ 1s a proper class, where 
“<” is the standard order on w. 


Prove that the relation < defined on On in VI.4.9 is transitive. 


Prove Theorem VI.4.30 directly by explicitly using the recursion (1) 
of VI.4.32. 


Prove VI.3.23 for WO sets by using the comparability of ordinals and 
Theorem VI.4.30 (proved via VI.4.32). 


For a > 0 prove that Lim(a) iff for all 6 < q@ there is a y such that 
B<y<a. 

For each aw # 0 show that sup* a = a. 

Prove that Lim(a) iff a 4 0 anda =a. 
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VI.14. 


VI.15. 


VI.16. 


VI.17. 
VI.18. 


VI.19. 


VI.20. 
VI.21. 
VI.22. 
VI.23. 
VI.24. 


VI.25. 
VI.26. 
VI.27. 
VI.28. 
VI.29. 
VI.30. 
VI.31. 


VI.32. 
VI.33. 
VI.34. 
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Prove that if a set @ 4 A C On does not have a maximum, then sup(A) 
is a limit ordinal. 


Prove that there are arbitrarily large limit ordinals, that is, for each a 
there is a 6 > a such that Lim(8). This problem addresses questions 
raised (and answers promised) following V.1.2. 


(Hint. By induction over w define the sequence f (0) = a and f(n+1) = 
f(™) + 1. Argue that (1): if 6 = supran(f), then Lim(8); and (2): 
a < p.) 

Prove that if f is a weakly continuous On-sequence, of ordinals that 
moreover satisfies (Va) f(a) < f(a +1), then f is non-decreasing and 
hence is weakly normal. 


Prove that the composition of normal functions is normal. 


Prove that if f is a normal transfinite On-sequence, then, for any y, it 
has a fixed point 8 such that y < 6. Check that your proof (along the 
lines of that for the Knaster-Tarski theorem) furnishes the smallest fixed 
point greater than y. 


Prove the Knaster-Tarski fixpoint theorem (VI.5.47) under the weakened 
assumption that in the PO set (A, <) every chain has a least upper bound. 


Refer to the proof of VI.5.47. For the y chosen, prove that s<, = sy. 
Prove that for all a, sp(Vy(@)) CN. 

Show that, for all a, 8, Vy(~) € Vy(B) implies a < B. 

Show that p(Vy(@)) =a+ 1. 


Define “standard rank” by rky(x) = minfa:x C N U Vy(a)}. Show 
that rky(@) =a. 


Relate py and rky. Show that for all sets x, opy(x) = rky(x) + 1. 
Complete the proof of VI.7.2. 

Substantiate the comment made following VI.7.5. 

Show for the J of Section VI.7 that J(a, 8) = {J(o, T): (a, T) < (a, B)}. 
Show for the < of Section VI.7 that #7 = <((0, «)) for all a. 

Prove that the function Aw.J[a x a] is increasing (order-preserving). 
Prove that the function Aw./J[a x a] has arbitrarily large fixed points. 
(Hint. Prove that it is normal.) 

Prove that ordinal addition is absolute for transitive models of ZF. 
Prove that if B > 0, thena + 8 = supt{a+y:y < B}. 


Prove thatl +a =a iffw<a. 
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VI.36. 


VI.37. 


VI.38. 


VI.39. 
VI.40. 
VI.41. 
VI.42. 


VI.43. 


VI.44. 
VI.45. 
VI.46. 


VI.47. 
VI.48. 
VI.49. 
VI.50. 
VI.51. 
VI.52. 
VI.53. 


VI. Order 
Let (Xp, <p) = (Vp, <g) for all B < a. Show that 
cat(Xp, <q) = cat(Y¥g, <q) 
B<a B<a 
Let (X, <,) and (Y, <2) be disjoint WO sets. Define < on X U Y by 


a<b iff {a,b} CX Aa<,bv 

{a,b} CY Aa<2bv 

aeXAbeyY 
Show that (X + Y, <,) = (X UY, <). 
(Left cancellation in ordinal addition.) Show that ifa +B =a+y, 
then B=y. 
(Right cancellation in ordinal addition does not hold.) Show by an 
example thatB+a=y+a,ABp=y. 
Prove that ordinal multiplication is absolute for transitive models of ZF. 
Show that a > Oimplies B <a- B. 
Show that if a > 0 and 6 > 1, thena <a- B. 
(Left cancellation in ordinal multiplication.) Show that if a > 0 and 
a-B=a-y,thenB=y. 
(Right cancellation in ordinal multiplication does not hold.) Show by 
anexample thata >OAB-a=y-af Bp=y. 


For any 0 < n € a, show thatn-w =o. 
Show that a > 0 is a limit iff forevery0 <n <w,a=n-a. 


Prove that ordinal exponentiation is absolute for transitive models of 
ZF. 

Show that a! = aw anda? =a-a. 

Prove VI.10.27(v). 

Prove that for all m,n in@w,m™” € w. 

Prove that for all m,n, k inw, (m-ny* =m*-n*, 

Show for all w > 1 that a’? > 6. Can > be sharpened to >? Why? 
Prove that Lim(q@) implies that (@ + n)° = a for alln € o. 


Prove that if the formula .4 has no quantifiers, then it is absolute for 
any class M. 


In the following few exercises models of fragments of ZFC are being sought. 
We mean (U, €)-models. 
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Show that V(@) is a model for ZFC—infinity (i.e., satisfies all the ZFC 
axioms except that for infinity). Indeed, infinity fails here. By the way, 
this shows that infinity is not implied by the remaining axioms. 


Show that V(q@) is a model for ZFC—collection, for any limit ordinal 
a> wo. 


Find a limit ordinal w > @ such that the collection axiom is false in V(a@). 
By the way, this shows that collection is not implied by the remaining 
axioms. 

(Hint. Experiment with a = w+ w.) 

Prove that the structure (On, U, €) is not a model of ZFC. 

Let N be aset of urelements such that f : N — nisa 1-1 correspondence 
for some n € . Prove that Vy(@) is a model of ZFC—infinity. (It fails 
infinity). 

Let N be a set of urelements such that f: N — q@ is a 1-1 correspon- 
dence. Prove that Vjy(@) is a model of ZFC—{infinity, collection}. (It 
fails both infinity and collection.) 

Show that for any transitive class M, PM(A) = MN P(A). 

Complete the proof of Lemma VI.8.16. 


Prove that Ly satisfies global choice. That is, show that there is a func- 
tion F on Ly such that for any set 0 4 x € Ly, F(x) € x. 


vil 


Cardinality 


In Chapter VI, among other things, we studied the WO sets and learnt how to 
measure their length with the help of ordinal numbers. A consequence of the 
axiom of choice was (Theorem VI.5.50) that every set can be well-ordered and 
therefore every set can be assigned a length.! 

In the present chapter we turn to another aspect of set size, namely its 
number of elements, or cardinality. It will turn out that for finite sets length 
and cardinality are measured by the same (finite) ordinal; thus, in particu- 
lar, finite sets have a unique length. As was already remarked, the situation 
with infinite sets is much less clean intuitively, and several WO sets of differ- 
ing lengths can have the same number of elements (e.g., @, o© + 1, ao + 2, 
etc). 

The following section will formalize the notions of “finite” and “infinite” 
sets. Intuitively, a set is finite if the process of removing its elements, one at a 
time, will terminate; it is infinite otherwise.? 

Thus for finite sets the process implicitly assigns the numbers 1, 2, 3,... to 
the first, second, third, ... removed items. Since the process terminates, there 
will be a natural number assigned to the last removed item. Evidently this 
number equals the cardinality, or number of elements, of the set. 


+ This length is not unique in general. For example, the set  U {«} can be assigned both the lengths 
o+lando. 

! This intuitive idea is for motivation only, and it will not be used anywhere except in the informal 
discussion here. One can easily get into trouble if “time” is taken too literally. For example, let 
us deplete w in finite time! Start by removing 0. Exactly | hour later remove 1; exactly 1/2 hour 
later remove 2;...; exactly 1/2” hour later remove n + 1; and so on, in the obvious pattern. It 
takes just 1+ 1/2+1/2?+---+1/2"+4+-.-=2 hours to complete the task. Yet, w is intuitively 
infinite. Of course, we would not have had this informal “paradox” if we were careful to say 
explicitly that we spend exactly the same amount of time between any two consecutive removals 
of elements. 
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In the infinite case it is not clear a priori how to assign a “number” that 
denotes the cardinality of the set. Thus the issue is temporarily postponed, and 
one first worries about whether or not two infinite sets have the same number of 
elements. This is an easier problem to address, and it can be addressed before 
we settle the question of what “number of elements” means. Indeed, any two 
sets (infinite or not) clearly have the same number of elements if we can match 
each element of one with a unique element of the other in such a way that 
no unmatched elements are left on either side. Technically, two sets have the 
same cardinality iff there is a 1-1 correspondence between them. Let us now 
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formalize this discussion and see where it leads. 


VII.1. Finite vs. Infinite 


VII.1.1 Definition. Two sets A and B are said to be equinumerous or equipo- 


tent, in symbols A ~ B 
The negation of ~ is 


VIIL.1.2 Exercise (Informal). Show that ~ is an equivalence relation on the 


universe (of sets) Uy. 


(Hint. For reflexivity note that the function 9 : 6 — proves the special case 


6~ @.) 


VIL.1.3 Definition. A set A is finite iff A ~ n, where n € w. We call n the 


, iff there is a 1-1 correspondence f : A > B. 


denoted by %. 


cardinality or cardinal number of A, and write |A| = n in this case. 


A set which is not finite is infinite. 


VIL1.4 Remark. According to the above definition and the hint in Exer- 
cise VII.1.2, @ is finite. Furthermore, each n € w is finite, and |n| =n. 


Corollary VIL.1.8 below shows that the cardinality of a finite set is indeed 


unique, for it is impossible to haven ~ A~m andn~¢m with m and n ino. 


VIL1.5 Example. Let I 


E denote the set of even numbers. Then E ~ o. 


Indeed, the function f : @ — E given by f = An.2n is a 1-1 correspondence. 


We now embark on showing that the definition of finite set is “reasonable”, 
that is, it leads to the (intuitively) expected properties of finite sets. 


VIL.1.6 Proposition. [fx Cn € a, then there is no onto function f : x — n. 


© 
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@* is not necessarily a natural number, so that “Cc” here is more general than “<”’. © 


Proof. Induction on n in w — {0}. 

Basis: n= 1: Then the only function f : x > nis @:@— 1, which is clearly 
not onto. 

I.H.: Assume that ifn = k, then there is no onto function f : x — n for any 
xn. 

We show, by contradiction, that the situation is unchanged whenn = k + 1. 
Let instead f : x > k +1 be onto, where x #9.' So let H = f~!(k). By onto- 
ness, @ 4 H, and, of course, H C x. 


Case l. k € x.Theng = f |(x—A)isontok, andx—H C k, contradicting 
the IH. 

Case 2. k € x. If f(k) t, then use x — {k} and go back to Case 1. Otherwise, 
letk Am = f(k) and set g = (f — {(k,m)} U (HT x {k})) U(H x {m}). 
Then g : x—{k} — kis onto, which contradicts the I.H. Finally, ifk = f(x), 
then f — (H x {k}) : x - H — k is onto, and we have contradicted the 
I.H. once more. 


VIL.1.7 Corollary. [fx Cn € a, thenx #n. 
VIL1.8 Corollary. [fm <n € a, thenm # n. 


One refers to Corollary VIL.1.8 as the pigeon-hole principle, in that if you have 
n pigeons and m holes (or vice versa) then there is no way to put exactly one 
pigeon in each hole so that no pigeon is left out (and no hole is empty). © 


VII.1.9 Proposition. [fx Cn € a, then x is finite and |x| <n. 
@* is not necessarily a natural number, so that “C”’ here is more general than “<”. © 


Proof. x is well-ordered by € (or <) as a subset of n. Thus, for some a, 
(x, €) = (@, €) 


In particular, a ~ x, say, via the 1-1 correspondence f : a —> x. 

By VII.1.7, a #4 n. Suppose that n < a. Let y = ran(f | n). Thenn ~ y 
via f | n, and y C x Cn, contradicting VII.1.7. Thus, w < n. That is, a is a 
natural number m, x ~ m, and m <n. 


+ The case x = @ cannot lead to onto functions, as seen in the basis step; therefore it is not 
considered here. 
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VIL.1.10 Corollary. [f A is finite and B C A, then B is finite and |B| < |A\. 


Proof. Let f:A—>n be a 1-1 correspondence for some new. Then 
B~ran(f | B) Cn. 

By Proposition VII.1.9, ran(f | B) ~ m for some m <n; thus B ~ m (by 
transitivity of ~). 


VIL.1.11 Corollary. A is finite iff there is an onto f :n > A for somen € w. 


Proof. If part. Let f :n — A be onto. Define g on A by g(x) = min f~!(x). 
Then g : A > ran(g) C nis a 1-1 correspondence; hence |A| = |ran(g)| <n. 
Only-if part. Let f :n — A bea 1-1 correspondence. Then f is onto. 


el is clear from the proof of the if part that f need not be total. © 
VIL.1.12 Corollary. [f A and B are finite, then A ~ B iff |A| = |B\. 


Proof. If part. Let |A|=|B| =n ea. Leth:n— A and g:n — B be 1-1 cor- 
respondences. Then go h~!: A > B isa 1-1 correspondence. 

Only-if part. Let f : A— B,h: A—nandg: B > mbe 1-1 correspon- 
dences, where m < n. The diagram below establishes m ~ n, a contradiction: 


eee eae 


a| |s 


VIL.1.13 Proposition. For alln € @ there isno f such that f :n > w is onto. 


Proof. Induction on n. 

Basis: n = 0: The result is immediate. 

.H.: Assume the assertion for n < k. Proceeding by contradiction, as- 
sume that f:k+1— a is onto and let H= f—'(0). Hence K+ 1D HFD 
by ontoness. Thus, k +1— H ~ m < k +1 for some m, by VII.1.9. Let 
g:m-—>k-+1-— 4H bea 1-1 correspondence. The diagram below shows that 
hog:m— q@is onto, contradicting I.H., since m < k: 


h=ax.( fie-+1-H))(@)-1 
> @W 


m— > K+1—-H 
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VII.1.14 Corollary. @ is infinite. 


© Before we turn our attention to infinite sets, we will look into finite sets more 
carefully, at the same time establishing a few facts about inductively defined sets 
and a technique of proving properties of finite sets by some sort of induction. 


VII.1.15 Definition. Given a class S and an n-ary function f for somen € a, 


we say that S is closed under f, or is f-closed, iff ran(f |S”) CS. 


If f is a set, then it is called an n-ary operation, or rule, on S. If F is a set 
of rules and S is a class, then “S is.¥-closed” means that S is closed under all 


fEF. 


The reader is familiar with operations on sets. For example, + is a total 2-ary 
operation on the real numbers, R. We prefer to call 2-ary operations binary. 
Also, Ax.1/x is anontotal 1-ary operation on R. We call 1-ary operations unary. 

We also note that J is closed under any f and that if for a choice of an n-ary 


f and class S it is the case that ran(f [| S”) = 9, then S is f-closed. 


The requirement that “operations” (or “rules’”) be sets does not limit the 
range of applicability of the concept, while it simplifies the technicalities. For 
example, it is meaningful to have a set of rules, since, so restricted, they are 
objects which can be collected into a class or set. Further justification for this 
restriction is embedded in the proof of Proposition VII.1.19 below. The material 
below formalizes work presented informally in Section I.2 to bootstrap our 


theory. 


VII.1.16 Definition. Given a set .7 and a set of operations on.¥. We say that 
a set S is inductively, or recursively, defined by .7 (the initial objects) and .F 
(the set of operations or rules) iff S is the C-smallest set that satisfies both of 


the following conditions: 


(a) FCS. 
(b) S is.¥ -closed. 


Under these conditions, we also say that S is the closure of Y under .7, 
symbols S = Cl(7,.F ). 


VIL.1.17 Remark. We clarify “C-smallest”: If after replacing S by the set 
in (a) and (b) we find that 7 satisfies (a) and (b), then S$ C T. 


VII.1.18 Example. @ is inductively defined by .7 = {0} and.¥ = {Axx 


in 


T 


U 


{x}}, where we may take as dom(Ax.x U {x}) o itself, or any ordinal a@ such 


thatw <a. 
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Indeed, first, w satisfies (a) and (b) of Definition VII.1.16. Secondly, let a set 
T also satisfy (a) and (b) with respect to the given .7 and.¥. 

That is, 0 € T and (Vx)(x € T > x U {x} € T). By induction over w, w C T; 
w is C-smallest. 


VII.1.19 Proposition. Given the sets.7 and ¥ of Definition VII.1.16, a unique 
set Cl(.7,.7 ) exists and is equal to (|,<;x, where J is the class of all sets x 
satisfying (a) and (b). 


Proof. First, let X =.7U User ran(f). By Exercise VII.3, X is a set that 
satisfies (a) and (b). Thus X € J, and hence S =(), ej x isa set, being a subclass 
of X. 

Next, it is easy to verify that S satisfies (a) and (b). (See Exercise VII.3.) 
Finally, S is C-smallest, for if a set Q satisfies (a) and (b), then Q € J. 

The above establishes existence. For uniqueness use the C-smallest property. 
(See Exercise VII.3.) 


We next note that there are two reasons justifying the term “inductively 
defined”, or “recursively defined”, for sets such as S = CI(7,.F ). 

First, the set S is defined in terms of (“smaller’, or “earlier”, instances of) 
itself (starting with .7). For, (b) of Definition VII.1.16 says that if we know S 
up to a certain “extent”, or “stage”, then we can enlarge S by applying to its 
current version the operations in.7. 

Second, the definition allows us to prove properties of all elements of S by 
induction with respect to the formation, or definition, of S. We also say, by 
induction over S. 

Such inductive definitions appear frequently in logic and mathematics, as 
we have already witnessed, which was the reason that compelled us to present 
an informal version of these results early on. 


VII.1.20 Theorem (Induction over an Inductively Defined Set). Let A(x) 
be a formula. Then (Vx € Cl(Y,.F )) F(x) is derivable from the assumptions 


(i) (Vx €.Z)A(x) (basis), and 
(ii) for each n-ary f €.F, 


(Wain)( f Gn) L> Par) A+++ \ Pan) > Af Gn)) 


Condition (ii) in the theorem is also pronounced “A(x) propagates with each 


pee 


operation in.¥”. The part “/(a,) A --- A A(a,)” is the I.H. for f. 


© 
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Proof. Let P be the class {x : A(x)}. By (i) 7 C P, and by (ii) P is.¥ -closed. 
The set X = ZULU rer tan(f) is also F -closed, and.7 C X. 

Thus Z = X M P is a set which satisfies (a) and (b) of Definition VII.1.16. 
Hence Cl(.7,.¥ ) € Z by Proposition VII.1.19, from which follows CI(7,.F ) 
cP. 


As the above proof suggests, proving by induction with respect to CI(7,.7 ) 
—or (7, .¥ )-induction — that (Wx € ClY,.F )P amounts to proving that the 
class P = {x : A(x)} is.F-closed and that, moreover, 7 C P. © 


VII.1.21 Definition. Given .7 and .¥. An n-tuple (x1,...,X,) is a (Y,.F )- 
derivation, or simply derivation if .7 and .Y are understood, iff for each 
i=1,...,n, at least one of the following holds: 


(a) x; €.Y, or 
(b) x; = f(xj,,.--,X;,), Where jm <iform=1,...,k and fe.F is ak-ary 


operation.: 


We say that x, is (Y,.F )-derived, or just derived if .7 and .Y are under- 
stood, by the derivation (x1, ..., Xn). 


VII.1.22 Remark. It is clear that if (x;,..., x,) is a derivation, then so is each 
(x1,..-,X~) forO < k <n. 


VII.1.23 Example. Let .7= {0}, .% = {Axx U {x}}. Then (0, 1, 2) and (0, 1, 0, 
0,1, 1, 0, 2) are (7, .F )-derivations. They both derive 2. 


VII.1.24 Theorem. Forany.7andF, {x :xis(7,F )-derived} = Cl(7,.F ). 


Proof. Let us denote by D the class {x :x is (Y,.¥ )-derived}. First, we do 
(7, F )-induction to show C1(.7,.7 ) € D. 

Basis: 7 CD, since for each a € .Y, (a) is a derivation of a. 

We next show that D is .%-closed. So let fe.F be n-ary and let f(a,) |, 


where a; €D for i=1,...,n. By definition of D, there are derivations 
Caney oes eles Cis) iy) 

Then(...,@,...,..., Gn) isaderivation (see Exercise VII.4), and therefore 
SOiS(..., 4], ..-5+++3On, f (Gn)) (why?). It follows that f(a@,) € D. Thus D is 


f-closed, and hence .¥ -closed, since fe.¥ was arbitrary. 
By induction over Cl (.7, .¥ ), we have obtained Cl(.7,.¥ ) © 


Y 


+ We also say that x; is obtained from k previous objects in the derivation by the application of a 
k-ary operation from .7 . 
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Next, to show the opposite inclusion, we do induction in w — {0} with respect 
to the length, n, of (.7,.F )-derivations. 

Basis:n = 1:Leta € D, where (a) is aderivation. Thena € .7 C Cl(7,.F ). 

I.H.: Assume the claim forn <k. Letn=k-+1, and (aj,...,ax,a) bea 


derivation of a € 


.Ifa €.7, thena€ Cl(7,.F ) as in the basis step. Let then 


a= f(aj,,...,a;,). By the IH., {a;,,...,a;.} © ClCY,.F), since each of 


Dad Sn etd te. 


under f/f. 


, a;,) is a derivation of length < k. But Cl(7, .F ) is closed 


@m particular, D is a set. 


The above theorem provides an alternative characterization of the set 
Cl(7,.F¥ ), which is more convenient when we want to prove that such and 
such an x is in the set. On the other hand, the original definition (VII.1.16) 
is more flexible to use when we try to prove properties of all the elements of 
Cl(7,.F ), in which case we use (.Y,.¥ )-induction. In such inductive proofs 
we do not need to refer to the natural numbers, not even implicitly, since we do 
not employ derivations. 

Before we proceed with an alternative definition of finite sets, due to 
Whitehead and Russell, we present one more result on inductive definitions, 
which properly belongs here and will also be used later (for example, in the 
proof of the Cantor-Bernstein theorem). 


VIL1.25 Definition. Let X be a set. An operator over X is a function 


I: P(X) > P(X). 


T’ is monotone iff for every S C T C X, T(S) C MT). 


Thus, a monotone operator is total. 


A set S C X is l’-closed iff T(S) C S. 


VII.1.26 Example. Let .7 be a set, and.¥ a set of operations. For each fe.F 
we denote its arity by n(f). X will denote .7U J rer tan(f). 

Define I’ 7,7, for simplicity referred to as just I’ in the balance of the exam- 
ple, by 1(Z) =. 7UU fez ran(f | Z"P)), for all Z C X. 

Thus a set of initial objects, .7, and a set of operations, .7, give rise to an 
operator over X. It is clear that T is monotone, since S C T C X implies 


ran(f |S") © ran¢f fT"), 


VIIL.1.27 Theorem. /f TP is a monotone operator over the set X, then the set 


satisfies T(S) = S. 


S= [) Zz 


ZX 
T(Z)CZ 


© 
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We call S a fixed point or fixpoint of T. It turns out that S is the C-smallest fixed 


point of [. © 


Proof. Lett J={Z:ZCX AT(Z)CZ}. Now J is a nonempty set, since 
X€EJ C P(X). Hence, S is a set. 


Next, we establish that [(S) C S, ie., that Baie eee Z) C S. Indeed, 


M(()Z<( TZ) [by monotonicity P(() Z) ¢ P(Z)] 
ZeJ ZeJ ZeJ 


ef es 


Zed 

To conclude, we need to show that S C T'(S). We proceed by contradiction: 
Let, instead, x € S—I(S). By monotonicity of [, 1S —{x}) C TCS) C S, and, 
since x ¢ T'(S), we have x ¢ [CS — {x}) as well. Thus '(S — {x}) C S — {x}; 
hence S — {x} € J, and therefore $= ries Z CS — {x}, a contradiction. 


VII.1.28 Remark. [(S) C S implies that S € J, and therefore S is the 
C-smallest set in J. If now (7) = T, then also (T) € T; hence T € J and 
consequently S C T. Thus the claim of the preceding note, that S is the 
C-smallest fixed point of I’, is correct. 

One usually denotes this C-smallest fixed point of [ by T or P®. © 


VII.1.29 Corollary ({'-Induction, or Induction over T). (Vx €T).7 (x), 
where IT. is an operator over a set X, is derivable from a proof that 
{x € X : .F(x)} is T-closed. 


Proof. Let Z = {x € X:.7 (x)}. By assumption, [(Z) C Z. Thus Z € J (where 
J is as in VII.1.27); hence P € Z and therefore (Vx € T').7 (x). 


VII.1.30 Remark. For an abstraction of what we are doing here see VI.5.47 
and VI.5.49. Here the PO set (A, <) is (P(X), C), and fisT. 

Monotone operators are also called inductive. VII.1.29 provides a justifica- 
tion for the name “inductive”, for it says that P has the “property” .7 (x) if it 
happens that the property “propagates with” T°: If Z is the set of all x € X which 
satisfy .7 (x), then all the elements of I'(Z) also satisfy the property. 


VIL1.31 Example. We conclude Example VII.1.26 by showing that the 
C-smallest fixed points of inductive operators generalize the notion of induc- 
tively defined sets.’ 


1 This generalization is proper, i.e., there are fixed points T which cannot be inductively defined 
as in Definition VII.1.16. These I" require infinitary operations, i.e., operations with infinitely 
many arguments. 
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Let S= Cl(.7,.¥) and X be as in Example VII.1.26. We will show that 
S= 77. First, 


T7y7(Z)C ZO ZU U ran(f | Z™P) CZ 
fEeF 


by definition of 7,7 (Example VII.1.26). In words, 
Ty7(Z)CZ iff FC Zand Z is.F¥-closed. (1) 


By Theorem VII.1.27, 


l9¢ () Z 
ZoX 
Dy77 (Z)SZ 


Z by (1) 


FTCLZEX 
Z is .¥ -closed. 


=S$ 


Finally, let us recognize (.7, .¥ )-induction as I.7,7-induction. 

To prove (Vx € S)VA(x) by (%,.F )-induction, we let P be the class 
{x : A(x)} and prove that.7 C P and P is .¥ -closed. If this plan succeeds, then 
we have actually proved (see the proof of Theorem VII.1.20) that the set Z = 
XNP= {xe X:A(x)} satisfies .7C Z and is .¥-closed. By (1), this is tan- 
tamount to proving that Z is .7,7-closed. By Corollary VII.1.29, this proves 
(Wx ED 7,7) A(x), ie., (Wx € S)A(x), by .7,7-induction. 


We will return to inductive operators in Section VII.7. To conclude the 
present section, we resume the study of finite sets. 

Definition VII.1.3 was based on the intuitive notion of depleting a finite 
set in “finite time” by successively removing its elements, one at a time. The 
following alternative definition due to Whitehead and Russell (1912) (see also 
Levy (1979)) builds, rather than depletes, a finite set from “scratch” (i.e., from ¥) 
in “finite time”, by successively adding elements to it. 

The definition given below is a variant of the original one, chosen in the 
context of the preceding groundwork on inductive definitions — and especially 
derivations — so as to make it clear that we are on the right track towards 
characterizing “finite” (see the following remark). 

We will use the term WR-finite' until we can show that the notions of “finite” 
and “WR-finite” are equivalent. 


+ Whitehead-Russell-finite. 
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VII.1.32 Definition. A set A is WR-finite iff A € CI(7,.F), where .7 = {J} 
and. ¥ = { fy: dom(fy) = P(A)Ay EAA fy = Axx U {y}}. 


If A is not WR-finite, then it is WR-infinite. 


VII.1.33 Remark. Since A is a set, so is .%. By Theorem VII.1.24, A is 
WR-finite iff there is some derivation (@,..., A) of A such that at each non- 
redundant step! a set x U {y} occurs, where x is available at an earlier step of 
the derivation and y € A. 

Thus in a “finite number of steps” we obtain A by collecting its elements 
together, one at a time, starting from Y. So the definition is reasonable. Note 
that the notion of natural number was only used implicitly (via the derivation 
concept) in this remark; it does not occur in the Definition VII.1.32, not even 
implicitly. 


VII.1.34 Proposition. % is WR-finite. 


Proof. Let .7= {0} and .Y =G. Since FY C Cl(7,.F), it follows that Oe 
CITY, .F ). 


VIL.1.35 Proposition. [f A is WR-finite, then so is A U {y} for any y. 


Proof. We look at the interesting case where y¢ A. By assumption, A € 
Cl(7, .F ), where 7 = {G}, and. is the set of all the total functions Ax.x U {z} 
on P(A), for all z € A. 

Let ¥ =.F'U {Ax.x Ut{y} } , where.¥‘ contains exactly the .¥ -functions ex- 
tended to P(A U {y}), so that for all z € AU {y}, dom(Ax.x U {z}) = P(A U {y}). 
A trivial (.7,.F )-induction shows that Cl(7,.7) C CICZ, &) (see Exer- 
cise VII.6). Hence, A € Cl(.Y, #) and therefore A U {y} € CIC, ¥), since 
CIC7Y, ¥) is closed under Ax.x U {y}. 


VII.1.36 Remark. Here is an easier alternative proof: Let (@,..., A) be a 
(7, F )-derivation. Then (4,..., A, A U {y}) is a (7, #)-derivation; hence 
AU{y} € CY, ¥). 

We prefer the original proof, because it avoids any reliance on the natural 


numbers. 


+ At any step of the derivation we may place @; such a step is redundant in that it does not help to 
progress with the formation of A. 
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VII.1.37 Theorem (Induction on WR-Finite Sets) (Zermelo (1909)). Let 
P(x) be a formula. To prove 


(Vx)(x is WR-finite > P(x)) 
it suffices to prove the following two things: 
(a) P(), and 
(b) 
(Vx)(Wy)(x is WR-finite > P(x) > P(x U {y})) 


Proof. Let W(x) be as above, and A be WR-finite. Then A € Cl(.7,.¥ ), where 
J = {O} and .¥ is as in Definition VII.1.32. Let S = {x : x is WR-finite A 
x CAAS(x)}. 

By (a) and VII.1.34,.7C S. 

By (b) and Proposition VII.1.35, S is %-closed. Therefore, by (.7,.F )- 
induction, Cl(.7,.F ) C S; hence A € S. In particular, P(A). 


VII.1.38 Theorem. A is finite iff it is WR-finite. 


Proof. If part. We show that for some n € w, A ~ n. The proof is by induction 
on WR-finite sets. 

Basis: 6 ~ 0. 

LH.: Let x be WR-finite, and forsomen € ow, f:x > nbea 1-1 corres- 
pondence. Consider x U {y}, and show this set to be equinumerous with some 
natural number. If y € x, then the result is the IH. itself. 

So let y ¢ x. Then f U {(y, 1)} provides a 1-1 correspondence x U {y} > 
nU{n}=n+l. 

Only-if part. Let f :n — A bea 1-1 correspondence. Then f[n] = A. By 
Exercises VII.7 and VII.8, A is WR-finite. 


From now on we drop the qualification “WR-” from “finite”. What we are 
left with from all this, besides a better understanding of finite sets, is the useful 
proof technique of induction on finite sets. 


VII.1.39 Theorem. Let |A|=n, and < be a well-ordering on A. Then 
(A, <) Sn. 


t By “=n” we mean, of course, “S 


oN A” 


(n, €)”. We are following the convention of Section VI.4 in 


442 VII. Cardinality 


Proof. By induction on n. 

Basis: n = 0. The result is trivial. 

LH.: Assume the claim forn = k. Letn = k + 1. By Exercise VII.10, A 
has a <-maximal element, say a. Now, a is also <-maximum, since < is a total 
order. That is, x < a for all x € A — {a}. 

Now, |A—{a}| = k (see Exercise VII.12) and the I.H. yield (A—{a}, <) =k. 
By pairing a with k, we extend the previous = to (A, <) =k+1. 


The above result establishes the claim made in the preamble to this chapter, 
namely, that there is a unique “length” for each finite set and that this length 
coincides with the set’s cardinality. We add that, of course, every finite set is 
well-orderable, a well-ordering being induced by A ~ |A| (see VI.3.12). oe 
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VII.2.1 Definition. A set S is enumerable iff S ~ w. If S is either finite or 
enumerable, then it is called countable. 
If a set is not countable, then it is called uncountable. 


Some authors use the term denumerable for enumerable. Also, the term at most 
enumerable is sometimes used for countable. © 


VII.2.2 Example. According to Definition VII.2.1 each finite set is also count- 
able. We also observe that w is enumerable (since w ~ w), so enumerable sets 
exist. Do uncountable sets exist? In other words, are there infinite sets which 
are not enumerable? Cantor, as we will see in the next section, answered this 
affirmatively. 


VII.2.3 Example. The set of the even natural numbers, E, is enumerable, since 
Ax.2x :@— Eis a 1-1 correspondence. A similar comment is true for the set 


of the odd natural numbers. 


VII.2.4 Example. Every enumerable set is infinite. Indeed, if A ~ w, then (see 
Exercise VII.2) A is finite iff w is. But w is not finite. 

If now A C B and A is enumerable, then B is infinite. This is so because of 
Corollary VII.1.10. © 
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VII.2.5 Theorem. /f A © a, then A is countable. 


Proof. If A is finite, then we are done. So let A be infinite. 
Define f by induction on n as follows: 


f(O) = minA 
and if n > 0, (i) 
f() = min (A _ ran(f | n)) 


Observe that 


(1) Since the recursion (i) is pure, dom(f) < w. Say, dom(f) = 1 € o (for 
some n). Thus, A = f[n].' By Exercise VII.18 (or VII.7), A is finite, a 
contradiction. Thus f is total on w. 

(2) f is 1-1. Indeed, let n 4 m, where we assume, without loss of generality, 
that m <n. Hence f(m) € ran(f| 1); therefore f(n) 4 f(m) by the second 
part of (i).t 

(3) A= ran(f). Let us assume instead that, for some m,m € A —ran(f). Since 
@ ~ ran(f) (why?), ran(f) is infinite; therefore, for some n, 


ms f(n) (ii) 


for, otherwise, (Vn € w) f(n) < m, Le., ran(f) C m, making ran(f) finite. 
Indeed, (ii) graduates tom < f(n), the strict inequality being justified from 
m ¢€ ran(f). This last observation is inconsistent with the definition of f (1) 
(second equation of (7)) since both m and f(n) are in A — ran(f | 7). 


Items (1) through (3) establish that m ~ A. 


VII.2.6 Corollary. A nonempty set A is countable iff there is an onto function 
f:o7 As 


Proof. Only-if part. Let A be finite and f :n — A bea 1-1 correspondence 
for some n € w. Trivially, f is a (nontotal) function on @, and is onto A. 

If, on the other hand, A is infinite, then there is a 1-1 correspondence 
f :@— A forsome f. This f is onto. 

Tf part. If A is finite, we are done. So let A be infinite, and let f :@ —> A 
be onto. 


+ f(n) f entails A C ran(f fn). 
See also Exercise VII.20. 
8 We emphasize that f need not be total. 
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By V.3.9, there is aright inverse, g : A > w, of f, inthe sense that fog = 1, 
where 1 is the identity on A. 

By V.3.4, g is total and 1-1; thus A ~ ran(g). By Exercise VII.2, ran(g) is infi- 
nite; by Theorem VII.2.5,@ ~ ran(g);hencew ~ A. (See also Exercise VII.21.) 


It is clear that, if we want, we can state the corollary so that f is total. Indeed, 
if f : A > B is nontotal but onto, then we can always extend it to a total and 
onto function h by taking h = f U {(x,b) : x ¢ A—dom(f)}, where b is any 
fixed element of B. Thus, whereas the original definition (Definition VII.2.1) 
of A being enumerable requires that an enumeration without repetitions 
exists (this is the 1-1 correspondence f :w— A), we now have relaxed this 
by saying, via Corollary VII.2.6, that a nonempty set A is countable iff an 
enumeration exists (possibly with repetitions — the 1-1-ness requirement being 
dropped). 


VII.2.7 Example. If A is countable and B C A, then B is countable. Indeed, 
if B = @, then the result is immediate. 

So, let B 4 Y, and let (by Corollary VII.2.6) f : @ — A be onto. The total 
function i = Ax.x on B is 1-1. Thus the inverse relation, i~!, is an onto (but 
nontotal, unless A = B) functioni~! : A > B. Clearly iv! 0 f :w > Bis 
onto. 


VII.2.8 Example. If A and B are countable, then so are AM B and A — B. 
This follows fom AN BC AandA—BCA. 

Note that the hypothesis could be weakened to simply require that just A be 
countable. 


VII.2.9 Proposition. @ ~ w x w. 


Proof. By V1.7.8. 


There is a more “elementary” proof that avoids the J of VI.7.4 and uses just 
multiplication and addition on w. 

We start with the total function f = Amn.(m+n)-(m+n)+m on wo. 
(By the way, w” is in the Cartesian product sense throughout. Ordinal exponen- 
tiation would have used an exponent “-2” instead, and cardinal exponentiation 
we have not introduced yet.) 

One can easily derive that f is 1-1, relying on what we know about ordinal 
addition and multiplication (cf. VI.10.11 and VI.10.19). Indeed, assume that 
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f(m,n) = f(m',n’') (m,n,m',n' ino), that is, 
(m+n)-(mtn)tme= (mM +n’): (mM +n')4+n (1) 
and prove 
m=mAn=n (2) 


Well, ifm +n =m’ +n’, thenm =m’ from (1) by VI.10.11, and then n = n’. 
We show that m +n 4 m’ +n’ cannot apply; then we are done. So let instead 
m+n <m' +n’. Thusm+n+1 <m' +n’. It follows? that 


(m+n)-(mt+tn)tm+mtn4tn41 < (Mm 4+n)-(m’' +n) 
Hence 
(m+n)-(m+n)t+m+m4tn4tn41 <(m'4+n’)-(m'4+n’)4+m' 


By (1) and VI.10.11 again, m +n -+n+1 <0, which is absurd. 


Thus the inverse relation f~!: @—> @* is a (nontotal) onto function. By 


VII.2.6, w* is countable. Since it is infinite,? it is enumerable. © 


VII.2.10 Corollary. @ ~ w" forn > 2. 


Proof. wt! = w" x w. Now use induction on n in w — {0, 1}, and VII.2.9. 


We can also view the above as a theorem schema (one theorem for eachn € N, 
rather than the single theorem (Vn € w)(n > 2 > w ~ w")) and prove it by 
informal induction on N — {0, 1}. © 


VII.2.11 Proposition. /f A; is countable fori = 1,...,n, then so is x i. Aj. 


© Theorem schema. © 


Proof. Let f; : @ — A; be onto fori = 1,...,n. Then (fi,..., fn) : @” > 
ci A; is onto, where (f\,..., f,) denotes the function Ax,.(fi(x1),..., 


fn(Xn))- 


1 By “squaring”. We are using the distributive law and commutativity of + and - on w freely — 
cf. VI.10.19(iv). 
t @ D {0} x w ~ o, the ~ obtained via (0, n) — n (cf. Exercise VII.2). 
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To conclude, we need an onto function g : w > w”, since (fi,..., fr)og: 
o> X ;—, Ai Will then be onto. This we have by VII.2.10. 


Informally, using N for @, one can conclude the above argument in this alter- 
native manner (without invoking VII.2.10): Let 


ea a xt] x41 Xn+1 
h=dxy.p\'” Po” +++ Py 


where p; is the ith prime (po = 2, pi = 3, p2 = 5,...). By the (informal) 
prime factorization theorem, h : N” — N is 1-1. Trivially, it is also total. Thus 
if g = h7', the inverse relation, g : N > N"” is an onto function. 

Of course, armed with sufficient strength of will (and time and space), one 
can develop the properties of the formal natural numbers to the point that one 
proves the prime factorization theorem within ZFC (ZF suffices, actually). Then 
one can turn the above informal reasoning fragment to a formal one. © 


VII.2.12 Proposition. /f fori = 1,...,n, A; is countable, then so is See, Aj. 


The above is stated as a schema, one theorem for each n € N. A formal version 
uses a function f with dom(f) = a. It takes the form 


(Wn € o)(Wi en)fli)~o> LUir@ :ien}~r ») 
and is proved pretty much like the schema version. © 
Proof. Let f; : @ — Aj; beonto fori = 1,...,n.Defineg : wx w —> U?_, Ai 
by g(i, x)= fi(x) for all x €m and i=1,...,n. Clearly, g is onto, for if 


ae [ee A;, then, say, a € A, for some m. By ontoness of f,,, there is an 
x € wsuchthat f,,(x) = a, thatis, g(@m, x) = a. Nowinvoke VII.2.6 and VII.2.9. 


We observe that the g of the previous proof is not total. 


VII.2.13 Theorem (A Countable Union of Countable Sets Is Countable). 
Tf, for alli € w, Aj; is countable, then so is Le 5 Aj. 


Proof. Let f; : @ — A; be onto for all i € w. Define g on w x w as in the 
previous proof, and proceed in an identical fashion. 


VII.2.14 Remark. (1) The nickname of the theorem has the obvious justifica- 
tion as the family (A;);<. is countable, indeed enumerable. 
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(2) The proof of Theorem VII.2.13 involved (tacitly) the axiom of choice. 
This happened during the definition of g, where one out of, possibly, several jf; 
was (tacitly) chosen for each i € w. The omitted details are as follows: 

Since {f : f :@— A; is onto} 4M fori € w, thereisanhin]];.,{f:f: 
@ — A; is onto}. For eachi € @, h(i) is the f; used in the proof. 

Was this a peculiarity of this particular proof? No, as a result of Feferman 
and Levy (1963) shows: without the axiom of choice we may have a countable 
union of countable sets that turns out to be uncountable. 

(3) The axiom of choice is provable for finite sets of sets, as we already 
know. Thus to construct g in the proof of Proposition VII.2.12 we did not need 
AC to select one (out of the possibly many) f; foreachi = 1,...,n. 

(4) In each of the results VII.2.11, VII.2.12, and VU.2.13, if any of the A; is 
enumerable, then so are 


For the case of U this follows from VII.1.10; for the other case see Exer- 


cise VII.15. © 


VII.2.15 Example. (°°, ©” is enumerable by Theorem VII.2.13, Corollary 
VII.2.10, and Remark VII.2.14(4). A direct proof — assuming an undertaking to 
develop enough arithmetic in ZF — bypasses the axiom of choice in this special 
case. 

Let f(x) = (xo, -.., Xm) whenever x = pj°*! pt!*!... pnt! By the prime 
factorization theorem, this leads to an onto (but nontotal) function f : @ > 


Le wo". 


VII.2.16 Example (Informal). Let us see that A U B is countable whenever 
A and B are, this time using an elementary (informal) technique traceable 
back to Cantor, rather than observing that this statement is a special case of 
Proposition VII.2.12. 

According to hypothesis (see the discussion following Corollary VII.2.6), A 
is enumerated (possibly with repetitions) as 


a0, 4), 2,..- 
and B is enumerated as 


bo, bi, bz, ... 
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Following the arrows in the diagram below, we trace an enumeration of A U B: 


ao a a2 a3 
4 A7Y AY Ad 
bo b, by b3 


VII.2.17 Example (Informal). We present an informal proof that w is enu- 
merable by providing an enumeration diagrammatically, as in the previous 
example: 


(0, 0) (0, 1) (0, 2) (0, 3) 
ee / va 

(1, 0) (I, 1) (I, 2) 
a 

(2, 0) (2, 1) 


(3, 0) 


VII.2.18 Example (Informal). It is easy to see that the set of integers, Z = 


{..., —-1,0,1,...}, is enumerable. One way to do this is to observe first that 
Ax.— x is a 1-1 correspondence between (the real) w (if you prefer, you may 
use the real alternative, N) and {...,—1, O}, the non-positive numbers NP. 


Then observe that Z = NP U a, and invoke either Proposition VII.2.12 or 
Example VII.2.16. 

Another way to do the same is to view Z as {0, 1} x N (or {0, 1} x @, using 
the real w), where (0,7) stands for n € N, whereas (1,7) stands for —n for 
n €N. By Proposition VII.2.11 (see also Remark VII.2.14(4)), once again, Z 
is enumerable. 


VII.2.19 Example (Informal). We next see that Q, the set of rational numbers, 
is enumerable. 

This may appear, at first sight, surprising, because of the density of rational 
numbers: Between any two rational numbers r and s there is another rational 
number, for example, (r + s)/2. Thus, intuitively, there seem to be “more” 
numbers in Q than in N (which does not enjoy a density property). Well, intuition 
can be wrong in connection with the cardinality of infinite sets. (We will see 
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another counterintuitive result in Section VII.4, namely, that there are as many 
reals in the unit square, [0, 1], as there are in the unit segment, [0, 1]. See also 
Exercise VII.35.) 
The justification of the claim is straightforward: 
Q= {m/n:meZAneN-— {0}} (1) 
Since N ~ N — {0} via Ax.x + 1, 
Z x (N — {0}) ~N (2) 


by Proposition VII.2.11 and Remark VII.2.14(4). 
Since the function Z x (N — {0}) > Q given by (m,n) > m/n is onto, (2) 
and (1) yield an onto function N > Q. 


VII.2.20 Example (Informal). Let us “count” the polynomials of one variable 
(say, x), with integer coefficients. 

Such a polynomial is a function of x, whose value for each x is given by the 
expression 


ay + ayx +anx?+---+a,x", for short, ba ajx'. 
i=0 
The a; are the coefficients, and, in this example, they are in Z. Whenever a, 4 0, 
we say that the degree of the polynomial is n. We identify each nth-degree 
polynomial, )~"_, a;x', with the (n + 1)-tuple (ao, ..., dn). 
It follows that the set of nth-degree polynomials is Z”*! and therefore the 
set of all polynomials is 


3 
ll 
aa 


Since we already know that wm ~ Z (Example VII.2.18), the set of polynomials 
is enumerable, by Example VII.2.15. (See also Exercises VII.26 and VII.27.) 


We now turn to a characterization of infinite sets due to Dedekind (1888).' This 
characterization is contained in the following definition. 


VII.2.21 Definition. A set A is Dedekind-infinite iff it is equinumerous with 
some of its proper subsets. Otherwise, it is Dedekind-finite. 


1 This characterizing property of infinite sets was also observed by Cantor and Bolzano. See also 
Wilder (1963, p. 65). 
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VII.2.22 Remark. A set which is Dedekind-infinite is also infinite. 

We prove the equivalent contrapositive statement, that if a set A is finite in 
the “ordinary” sense (Definition VII.1.3), then it is also Dedekind-finite (i.e., 
not Dedekind infinite). Indeed, if f : A > n € wis a 1-1 correspondence and 
if B is any proper subset of A, then 


(1) B~ f[B], and 
(2) f[B] Cn. 


By Corollary VIL1.7, f[B] 4 n; hence f[B] ~# A. By (1), B # A. Since B 
was an arbitrary proper subset of A, our conclusion is that A is not Dedekind- 
infinite. 

In what follows we will show the equivalence of the two definitions of finite 
(or infinite). 


For the balance of this discussion, “infinite” and “finite”, without the qual- 
ification “Dedekind”, refer to the ordinary notions, as per Definition VII.1.3. 


VII.2.23 Lemma. Every infinite set has an enumerable subset. 


Proof. Let A be infinite. By VI.5.54, there is an w and a 1-1 correspondence 
f:a— A. By Exercise VII.2, a is infinite; hence w < a, i.e., @ C a (otherwise, 
a <q, whence a@ is a natural number and therefore finite). 

The set f[w], being an enumerable subset of A, settles the issue. 


VII.2.24 Lemma. /f A is infinite and B is countable, then AU B ~ A. 


Proof. Let C = B — A, and D be an enumerable subset of A. 

By Example VII.2.8, C is countable. Thus D U C ~ D (see Re- 
mark VII.2.14(4)). Extend the above 1-1 correspondence to one between AU B 
and A, as follows: Let each x € A — D correspond to itself, and observe that 
ANC=6, AUB=AUC=(A—D)U(DUC), and A=(A— D)UD. 


VIL.2.25 Theorem. The notions “infinite” and “Dedekind infinite” are 
equivalent. 


Proof. That Dedekind infinite implies infinite is the content of Remark VII.2.22. 
So let next A be infinite. 


Case I. There is a 1-1 correspondence f:w— A, i.e., A is enumerable. 
From w ~ w — {0} (viax bh x + 1) it follows that f[@ — {0}] ~ A; more- 
over, f[w — {0}] C A. 
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Case 2. A is not enumerable. By Lemma VII.2.23, A has an enumerable 
subset B. By Exercise VII.16, A— B is infinite (otherwise A =(A—B)U B 
is enumerable). By Lemma VII.2.24, A ~ A— B. 


In the next section, among other things, we see that case 2 above is not 
vacuous. © 
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In this section we study the important technique of diagonalization through 
several examples. It was devised by Cantor in order to prove that the set of 
real numbers is uncountable, and it has since found many applications in logic, 
such as in the proof of Gédel’s incompleteness theorems and later in recursion 
theory (and its offspring, computational complexity). To a large extent, recur- 
sion theory and complexity theory are the art and science of diagonalization. 


Described generally, given a “square table” (that is, a total function F: A x 
A — X), this method defines an A-long array of elements of X (that is, a 
function D : A — X) that is different from all the “rows” of the table (where 
the hth “row” of F is the function H = Ax.F(h, x)). The idea, due to Cantor, 
can be described like this: 


Start with the function Ax.F(x, x) (the “main diagonal” of the table). If this 
were to be the same as the hth row, then, in particular, H(h) = F(A, h)." 

Suppose now that we want to build an A-long array that cannot equal the 
hth row. It suffices to take a modified diagonal: Just change the entry F(A, h). 


It is clear now what needs to be done to get an A-long array D that fits 
nowhere as a row. Take, again, the diagonal, but change every one of its entries. 
That is, 


D(x) = some element in X — {F(x, x)} (1) 


Clearly this works, i.e., foreach a € A, D 4 Ay.F(a, y); for (1) yields D(a) 
F(a, a). 

We will illustrate the above general description of diagonalization in the 
following examples. 


VII.3.1 Example. Let (fi)ncw be a countable family of total functions 
frior o. 


1 Intuitively, the main diagonal was “rotated” counterclockwise, by 45 degrees, around the pivot 
entry F(A, h). 
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The function d = Ax. f(x) + 1 is different from each f, at input x; thus 
d does not belong to the family. Let us verify: If d= f; for some i € a, then 
d(i) = f,(i). On the other hand, by the definition of d, d(i) = f;(@i) + 1. 

We conclude that f;(i) = f;(@) + 1, a contradiction, since both sides of “=” 
are defined.t 

Finally, we note that the argument just presented indeed fits the general 
description of diagonalization. Here the “table” is F =Amn.f,,(n), and one 
particular way to build D of (1) (here called “d”’) is to make d(x) different from 
f(x, x) by adding | to the latter. 


d(x) = fx(x) + x + 1 would have worked too. 


VII.3.2 Proposition. The set A = °w is uncountable. 
ee Scol that * Y is the set of all total functions X¥ > Y. © 
Proof. If A were countable, then there would be an enumeration (f,)new of all 


its members. Example VII.3.1 shows that this is impossible, as any such (at- 
tempted) enumeration will omit at least one member of A, for example d. 


VIL.3.3 Corollary. The set °2 is uncountable. 


Proof. Exercise VII.30. 


VII.3.4 Example (Informal). Diagonalization is often applied to define a class 
that does not belong to an indexed family of classes (P(x)),.<y, where J is a 
class and P a relation on J (we are speaking informally, ignoring issues of 
left-narrowness). This is done by defining the “diagonal class” E, setting 


D={xeJ:x ¢ Plx)} (1) 


or 


f={xeJ:x Px} (1’) 


Clearly this works, i.e., E 4 P(x) for all x € J, for if 


P(a) forsomea € J (2) 


1 Clearly, this argument breaks down if the family (f)n¢~ contains nontotal functions, in which 
case we employ Kleene’s weak equality ~. Then it is possible to have f;(i) +, in which case 


AO~ fiO+1. 
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then 


by (2) by (1) 


a é€ P(a) <> a € Ex— a € P(a) 


— a contradiction. 


This is not a new flavour of diagonalization, but fits under the general dis- 
cussion on p. 451 above. Indeed, think of the “table” F : J x J — 2 defined 


by 


ifx Py 


0 
BGS {; otherwise 


The general discussion would lead to the “diagonal object” 


= hx.1 — F(x, x) 


(3) 


which is a J-long 0-1-valued array that cannot be a row of the “table”. “DD” is 
just another way of saying “EE”, for the former is the characteristic function of 
the latter, as the following equivalences show (for x € J): 


Dx)=0<¢Fa,x)=lox Pxroxek 


As an application, let us look at the family (x),<g, where S is any class of sets. 
Here J = S and P is the identity. Let D = {x eS: x ¢ x}, ie, x € Diff 
x €SAx ¢ x. Thus D behaves at x differently than x at x, and therefore it is 
not one of the x’s; in other words, D ¢ S (this is so because this diagonalization 
tells us that D is not in the family (x),<s, but this family equals S). 

So, by diagonalization, we have obtained an object not in S. If we now let 
S be Vy, the class of all sets, then D is the Russell class, and the above argu- 


ment establishes (once again), that D ¢ Vy, ie., that 


is not a set. Therefore, 


Russell’s proof was a diagonalization over all sets to obtain an object which is 
not a set, and while ingenious and elegantly simple, the technique was borrowed 


from Cantor’s work. 


With practice, one can expand the applicability of diagonalization to more 
general situations — for example, cases where we apply some transformation 
(function) to one of the table “coordinates”. Say, given P and J as in VIL3.4, 
if G: J — J is a function, then the class E = {G@) € J: Gx) ¢ P(x)} is 
different from all P(x) (x € J). Similarly, if f : @ — ®w is an enumeration 
of total functions @ > o, then d = Ax. f(x)(x) + | is not in the range of f 


| F is the characteristic function of P. 
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(otherwise, d = f(a) for some a — hence d(a) = f(a)(a) — but also d(a) = 
f(@(a) + 1). © 


VII.3.5 Example (Informal). Throughout, [0, 1] will denote the real closed 
interval, {x €¢ R: 0 < x < 1}. Since Q, the set of rational numbers, is enumer- 
able, then so is [0, |] NQ= {x € Q:0 <x < 1} (see Example VII.2.7). Now 


each rational in [0, 1] has a decimal expansion 0.agq)...a; .... For example, 
1=0.99...,0=0.00..., 1/3 =0.33... 
=—— —— =—— 
all 9’s all 0’s all 3’s 
and 


1/2 =0.500... butalso 1/2=0.499... 

Sym a) 

all 0’s all 9’s 
We next claim that the set of all decimal expansions of rationals in [0, 1] 
is enumerable. Indeed, this set equals Qing U Qin, where Qing is the set of all 
infinite representations such as 0.33..., 0.99..., 0.499..., whereas Qgn is 
the set of all finite representations, i.e., those that terminate with an infinite 

sequence of 0’s, such as 


0.00... and 0.500... 
=—— =—— 
all 0’s all 0’s 


Note that some rationals have both infinite and finite representations. 

By Exercise VII.31, Qing is equinumerous to an infinite subset of Q, and 
also Qin is equinumerous to an infinite subset of Q; hence Qing U Qin ~ @, 
as required. So we have an enumeration (O.aja} ...d/...)new of all decimal 
expansions of the rational numbers in [0, 1]. 

Consider the decimal expansion d = 0.d°d!...d' ..., where for alli € @, 
diX ai . For example, a well-defined way to achieve this is to set 


ies 2 ifai=1 
1 otherwise 
By diagonalization, d does not belong to the family (O.aja} ...aj... new: 
Since the latter represents all the rationals in [0, 1], and since d represents a 
real in [0, 1] it follows that d is an irrational number in [0, 1]. 
One can now continue to discover more irrationals in the interval by adding 
d at the beginning of the enumeration and then diagonalizing again to obtain 


} This type of argument shows, in recursion theory, that the set of all total computable functions 
cannot be “effectively” enumerated. 
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a new irrational d’ (each irrational has a unique expansion — infinite only). 
The reader may wish to refer to Wilder (1963, p. 89) to see a very interesting 
extension of this type of discussion. In particular, Wilder applies this type of 
diagonal technique to the set of algebraic numbers in [0, 1] to “construct” trans- 
cendental, i.e., non-algebraic, numbers (refer also to Exercise VII.27). 


VII.3.6 Example (Informal: Cantor). The set of real numbers in [0, 1] is 
uncountable. 

Suppose instead that [0,1] ~ wm. Then, entirely analogously with Ex- 
ample VII.3.5, [0, ln U [0, Line ~ @, where [0, | ]fn is the set of all finite, and 
[0, line the set of all infinite, decimal expansions of reals in [0, 1], and therefore 
there is an enumeration (0.aja} ... Jaca Of the members of [0, 1 ]fin U [0, 1 ine. 

However, the existence of the diagonal number d, defined as in Ex- 
ample VII.3.5, leads to a contradiction: On one hand d cannot be in the enu- 
meration. On the other hand, 


d=0.d°d'... 
——S" 


1’s and 2’s 


Hence it is in the enumeration. This contradiction establishes the claim. 


VII.3.7 Proposition (Cantor). The set P(@) is uncountable. 
ke coe with Exercise VII.19. 


Proof. Let us assume the contrary, i.e., that there is a 1-1 correspondence 
f:@ — P(@). Construct the diagonal set D = {x € w: x ¢ f(x)}.7 Thus on 
one hand D is not in the range of f; on the other hand it must be, since D C w. 
This contradiction establishes the claim. 


VII.3.8 Example (Informal). For the purpose of this example we will state 
without proof a few facts. To begin with, each real number r in [0, 1] has a binary 
expansion, or can be represented in binary notation, as r = 0.bob, ...D;... 
This notation, or expansion, means (quite analogously with the familiar decimal 
case) that r = )°°°, b;/2'*', where each J; is 0 or 1, and is called the ith binary 
digit or bit. An expansion 0.b9b; ...b; ... is finite if for some n, b; = 0 for all 
i > n; otherwise it is infinite. Infinite expansions are unique. 


* To connect with the discussion on p. 451, here P on w is given by nPm iff'n € f(m); thus 


P(m) = f(m). 
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Our purpose is to show that [0, 1] ~ °2 ~ P(@). Indeed, to each A € P(w) 
we associate its characteristic function on w, x 4, defined by 


0 ifneA 
1 otherwise 


Xan) = | 


It is clear that A b> x, is a 1-1 correspondence from P(@) to °2, which proves 
that the rightmost ~ holds. 


We next consider [0, 1 Jing and [0, 1 ]fn, the sets of all infinite and finite binary 
expansions, respectively, of all the reals in [0, 1]. For example, 


0=0.00..., 1=0.11..., 
—j_—"” 


SYS 
all 0’s all 1’s 

1/2=0.100... butalso 1/2= 011... 

—— —— 

all 0’s all 1’s 


Since the expansions in [0, |]in represent rationals (see Exercise VII.32), 
we have [0, | ]an ~ @ and therefore 


(0, Line ~ (0, Line U [0, in (1) 


by Lemma VII.2.24. Since every non-zero real has a unique infinite binary 
expansion (see also Exercise VII.33), (1) yields (0, 1] ~ [0, Hing U[0, Uein, and 
one more application of Lemma VII.2.24 ({0, 1] ~ (0, 1]) yields 


[0, 1] ~ [0, line U [0, 1 fin 


To conclude, observe that f bh 0.f(0) fC)... f@)... isa 1-1 correspondence 
°2 — [0, Line U [0, Urn. 


VII.3.9 Remark. The technique of Example VII.3.8 showed that *2 ~ P(x) 
for any set x. 


VII.3.10 Theorem (Cantor’s Theorem). For any set x, x 7% P(x). 


Proof. By contradiction, let there be a 1-1 correspondence f : x > P(x). This 
leads to the family of sets (f(@))acx, to which we can readily apply the technique 
of Proposition VII.3.7: 


Let D={aex:aé¢ f(a}. Thus DF f(a) foralla € x, yet D C x; thus 
it must be an f(a) after all. This contradiction establishes the claim. 


© 
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We note that, relying on Example VII.3.8, we could have proved Cantor’s 
theorem as follows: Let instead x ~ P(x). Then also x ~ *2. So leta bh ga 
be a 1-1 correspondence x — *2. The diagonal function d = Aa.1 — gq(a) is 
different from each g, (at a), yet it is a total O-1-valued function on x, so itis a 
8a. Contradiction. 

The reader will recognize that the two arguments are essentially identical. 
Indeed, d = xp. 


VII.3.11 Example (Informal). We show here that (—1,1)~R, where 
(-1, 1) = {x € R: -1 < x < 1}. Indeed, let f on R be defined by 


Trivially, f is total. Next, we see that it is 1-1. Indeed, let 


a a b 
1+|a| 1+4+|d| 


(1) 
where a and b are in R. This leads to 

a—b=bDla| —al|b| (2) 
By (1), ab => 0, so we analyze (2) under just two cases. 


Case 1: a> QOand b > 0. Thena —b = ba —ab=0. 
Case 2: a < Oand b < 0. Then a — b = —ba — (—ab) = 0. 


Both cases lead to a = b, so f is 1-1. 

Finally, f is onto (—1, 1). Indeed, let c € (—1, 1). The reader can easily verify 
that if c=0, then f(0)=c; if —1<c <0, then f(c/U+c))=c; if0<c <1, 
then f(c/(1—c))=c. 


VII.4. Cardinals 


In this section, following von Neumann, we assign a measure of cardinality to 
each set, its cardinal number. 

At the very least, cardinal numbers must be ~-invariants (i.e., equinumerous 
sets must measure identically). It is also desirable that this measure be consistent 
with the measures we have already accepted for the cardinality of finite sets, 
since the latter perfectly fit with our intuition. 

The requirement that cardinal numbers be ~-invariants means that for any 
set A, its cardinal number depends on the class of all sets equinumerous to 
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A rather than just on A. It was therefore natural that, at first, mathematicians 
defined (Frege-Russell definition) the cardinal number of a set A to be “the ‘set’ 
of all sets equinumerous to A”. This, of course, eventually led to trouble because 
these cardinal numbers were “too big” to be sets. For example, the cardinal 
number of {4} would be, according to this definition, the “set” of all singletons 
(one-element sets), but this “set” is in 1-1 correspondence with the class of all 
sets and urelements (via x +» {x}) and therefore is not a set, by the collection 
axiom. 

Thus cardinal numbers, as “defined” above, cannot be objects of study in our 
theory. However, in the old way of doing set theory (where any collection, in 
principle, was a set and therefore was entitled to be studied in the theory) there 
were still problems, as even cardinal numbers of singletons would be closely 
associated with the “self-contradictory notion” of the “set of all sets” (the reader, 
once again, is referred to the discussion in Wilder (1963, pp. 98—100)). 

The way out this difficulty (von Neumann) is simple. Rather than take for 
the cardinal number of A the class of all sets equinumerous to A, just take a 
“canonical” or “normalized” representative from this class (the terms in quotes 
mean that the representative ought not to depend too strongly on A itself, 
so that ~-invariance can be assured). By Zermelo’s theorem (VI.5.54), each 
such class contains ordinals; so take the least such ordinal to measure the 
cardinality. 


VII.4.1 Definition. For any set x, its cardinal number, or cardinality, is defined 
to be min{a : a ~ x} and is denoted by Card(x). Cardinal numbers are also 
simply referred to as cardinals. Thus a cardinal is just the cardinal number of 
some set. 

We shall use (in argot) lowercase fraktur letters to denote arbitrary cardinals, 
i.e., cardinal-typed variables (e.g., a, 6, m), but also lowercase Greek letters 
around the middle of the alphabet, typically, « and i. The class of all cardinals 
will be denoted by Cn. 


@By definition, Cn C On. 
Here are some useful and immediate consequences. 


VIL.4.2 Proposition. For any sets x and y the following hold: 


(a) x ~ Card(x). 
(b) x ~ y iff Card(x) = Card(y). 
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Proof. Part (a) follows immediately from the definition. 

For (b), assume x ~ y. Then Card(x) = min{a : a~x}=min{a:a~y}= 
Card(y). This settles the only-if part. For the if part, use (a) to write x ~ Card(x) = 
Card(y) ~ y. 


rat (b) shows that Card() is a ~-invariant. © 


VII.4.3 Proposition. For any ordinal a, a € Cn iff there is no B < a such that 


Bra. 


Proof. Leta € Cnbe due toa = Card(x) for some set x. Consider the set (why 
set?) S ={y:y ~ x}. We have 


a=mins (1) 


If some 6 <a satisfies 6 ~ a, then this contradicts (1), since 6 € S. This argu- 
ment establishes the only-if part. 

For the if part, let there be no 6 < a@ such that 6 ~ a. Then a is smallest in 
{y : vy ~ a}, ie., @ = Card(q@) and therefore a € Cn. 


Proposition VII.4.3 gives a characterization of cardinals which is independent 
of the sets whose cardinality these cardinals measure. It is helpful in showing 
that specific ordinals are cardinals. Because of the proposition, cardinals are 
also called initial ordinals. © 


VII.4.4 Example. Every natural number, i.e., finite ordinal, is a cardinal. 
Indeed, by Corollary VII.1.8 we obtain that a B whenever a € 6 €a. 
Therefore, fixing attention on f, it is a cardinal by Proposition VII.4.3. 
It follows that 6 = Card(f), but then (Definition VII.1.3) Card(6) = ||, 
so that the definition of cardinals for finite sets is indeed a special case of 
Definition VII.4.1, as we hoped it would be. © 


So far we have obtained that Cn ¥ 9, in particular, @ C Cn. 


VII.4.5 Example. We now establish that w € Cn. Indeed, just invoke Proposi- 
tion VII.1.13 to see that a € w impliesa % w. 


We have just witnessed that w is the smallest infinite cardinal (i.e., smal- 
lest infinite ordinal that is a cardinal). Definitions VII.1.3 and VII.2.1 are 
also worth restating in the present context: x is finite iff Card(x) < a; it is 
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enumerable iff Card(x) = @; it is countable iff Card(x) < w; it is uncountable 


iff Card(x) > @. & 


VII.4.6 Proposition. For any ordinal a, 


(i) Card(a) < a, 
(ii) Card(a) = @ iff € Cn. 


Proof. (4): Card(~) = min S, where S = {y : y ~ a}. Buta Ee S. 
(ii): If part. Let a € Cn. If Card(a) < a, then Card(a) ~ a contradicts Propo- 
sition VII.4.3. 
Only-if part. Trivially, the hypothesis says “a is a cardinal’. 


VII.4.7 Example. For any cardinal m, Card(m) = m. Indeed, this just rephrases 
Proposition VII.4.6(ii). 
This observation is often usefully applied as Card(Card(x )) = Card(x), where 


x is any set. © 


VIL.4.8 Proposition. Every infinite cardinal is a limit ordinal. 


Proof. The claim is known to be true for w. So let m < a, and assume instead 
that a = 6 + | for some f. Now, is infinite; otherwise a= 6 + 1 < @. 

By Lemma VII.2.24, BU{B} ~ 6; therefore a ~ 6, which along with B <a 
contradicts that a is a cardinal. 


The above result shows that there are many “more” ordinals than cardinals. 
For example, w + 1, @+ 2, and w +i for any i € w are not cardinals. 

It also suggests the question of whether indeed there are any cardinals above 
a. This question will be eventually answered affirmatively. As a matter of fact, 
there are so many cardinals that Cn is a proper class. 

The following result is very important for the further development of the 
theory of cardinal numbers. 


VIL4.9 Theorem. For any sets A and B, ACB implies that Card(A) < 
Card(B). 


Proof. Let 6 = Card(B) and f : B — 6 bea 1-1 correspondence. Define < on 
B by 


x<y iff f(x)e fO) (1) 
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By VI.3.12, < well-orders B and hence A. Let a = ran(#), where ¢ on A is 
given by the inductive definition 


P(y) = {P(x) 2x < yAx€ A} (2) 


We know (VI.4.32) that ran(@) € On (which justifies calling it “aw”) and that 
(x) € On for all x € A. 


We next show that 
for all x € A, d(x) © f(x) (3) 


We do so by induction over A with respect to <. So assume (3) for all x in A 
such that x < y € A (this is the I.H.), and prove it for y. Now if y is minimum 
in A (basis), then ¢(y) = @, from which the claim follows in this case. Let then 
y be non-minimum and z € ¢(y). It follows from (2) that z = 6(x) for some 
x € A where x < y. By the LH. z C f(x) € f(y) (A) contributes “e”’), and 
since f(x) and f(y) are ordinals (being members of b), we obtain z € f(y), 
which concludes the inductive proof of (3). 


We next observe that a < b,i.e.,a@ C 6b. Indeed, let y € a. Then y= (x) 
for some x € A, so that y < f(x) by (3). Since f(x) € b,i.e., f(x) < b, we 
gety <(ie., €)b. 

This last result, along with Propositions VII.4.2 and VII.4.6, yields 
Card(A) = Card(a) < a < 6 = Card(B). 


The above theorem will provide the basic tools to compare cardinalities of 
sets. To this end we introduce a definition. 


VII.4.10 Definition. For two sets A and B, A = B means that there is a total 
and 1-1 f:A— B. 


Intuitively, whenever A = B, B has at least as many elements as A. We will 
indeed see in VII.4.14 that Card(A) < Card(B) is derivable under the circum- 
stances. Let us, however, first state some trivial but useful observations. 


VII.4.11 Proposition. 


(i) = is reflexive and transitive. 

(ii) IfA C B, then ASB. 
(ii) A= B iffforsomeC CB A~C. 
(iv) IfA ~ B, then A= B. 
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Proof. Exercise VII.37. 


VII.4.12 Example. Case (iv) in Proposition VII.4.11 cannot be improved to 
“aff. For example, x +> {x} establishes a = P(a) for any set a. We know how- 
ever thata ~ P(a) by Cantor’s theorem (VII.3.10). This motivates the following 
definition. 


VIL.4.13 Definition. If A < B but A % B, then we write A < B. 


VII.4.14 Proposition. For any sets A 4 % and B, the following are equivalent: 


Gd) AXB. 
(ii) There is an onto function f : B —> A. 
(iii) Card(A) < Card(B). 


Proof. The equivalence of (i) and (ii) follows directly from V.3.9. Next, let 
us assume (i) and prove (iii). By Proposition VII.4.11@ii), A ~ C C B for 
some C. Hence, using Propositions VII.4.2 and VII.4.9, Card(A) = Card(C) < 
Card(B). Conversely, assume now (iii), i.e., Card(A) C Card(B). The diagram 
below shows that goio f : A > Bis total and 1-1, thus establishing (i), where 
i: Card(A) — Card(B) is the inclusion map x t» x and f : A — Card(A) 
and g : Card(B) — B are 1-1 correspondences: 


A goiof B 


‘| Ie 


Card(A) ———~ Card(B) 


VII.4.15 Corollary. For any sets A and B, A ~< B iff Card(A) < Card(B). 


Proof. Ifpart. The hypothesis yields A < Band A ~ B (otherwise the cardinals 
of the two sets would be equal). Hence A ~< B. 

Only-if part. The hypothesis yields Card(A) < Card(B) and Card(A) # 
Card(B). 


VII.4.16 Corollary (Cantor). For any set a, Card(a) < Card(P(a)). 


Proof. Indeed, a < P(a) by Example VII.4.12. 
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By Corollary VII.4.16, there are infinitely many cardinals. Indeed, for any a, 
Card(P(a)) is a bigger cardinal. The preceding proposition relates comparisons 
of sets (as to size) with comparisons of their cardinal numbers and leads to 
the following important result, which has several names attached to it: Cantor, 
Dedekind, Schréder, and Bernstein. 


VII.4.17 Theorem (Cantor-Bernstein). For any sets A and B, if A= B and 
B =A then A ~ B and conversely. 


Proof. The “conversely” part directly follows from Proposition VII.4.11. For 
the rest, observe that A = B and B = A yield Card(A) < Card(B) and Card(B) < 
Card(A) respectively, by Proposition VII.4.14; thus Card(A) = Card(B). 


© Our approach to cardinals relies on AC. Some authors define cardinal numbers in 
a way independent of AC (see, for example, Levy (1979)). In such an approach, 
there is a more obscure — but AC-free — proof? of the Cantor-Bernstein theorem, 
which we include here. 


Let f : A— Band g: B > A be total and 1-1. We want to conclude that 
A~ B. 
Consider the operator T over A given by 


IS) = (A — g[B]) Ugo f[S] () 


for all S C A. Clearly I is monotone, so for some X C A we have '(X) = X. 
(For example, IF will do for X. You may want to review Theorem VII.1.27.) Set 


X' = f[X] (2) 
Y=A-X (3) 
Y’ = B — x’ (4) 
so that 
A=xXUY and XNY=@6 (5) 
B=X'UY' and X'NY'=6 (6) 


tT Of course, AC enters via the Zermelo theorem in Definition VII.4.1 and in the proof of Theo- 
rem VII.4.9, on which the above-given proof of the Cantor-Bernstein theorem is based. 
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We will show that Y = g[Y’]. Indeed, 


g[B-X'] by (4) 
g[B] — g[X’] since g is 1-1 and total 
= g[B]—goflxX] by) 
g[B]N(A— go f[X]) (7) 


By (7) and De Morgan’s law, A— g[Y’] = (A—g[B]) Ugo f[X] =T(X) = X. 
By (5), this is what we want. 
To conclude, define h : A > B by 


F(x) ifxe xX 
h = 
() ee ifxeY 
where g~! : ran(g) — B is, of course, a 1-1 correspondence. Clearly, A ~ B via 
h. This concludes the AC-free proof. oe 


VII.4.18 Example (Informal). The self-contradictory notion of the “set of all 
sets”. Let us travel back to the point in time prior to the introduction of the 
axiomatic foundation of set theory. At that point sets and classes meant the 
same thing. The statement x € x was not necessarily false for all sets x; thus 
the notion of the set of all sets would not be disallowed via this route. Instead, 
a cardinality argument was then applied to show that the set of all sets could 
not possibly exist: Indeed, if V is the set of all sets, then P(V) C V, therefore 
P(V) = V. Since also (VII.4.12) V = PCY), we get V ~ PCV), contradicting 
Cantor’s theorem. 


VII.4.19 Example (Informal). Let us see that (0, 1] x (0, 1] ~ (0, 1]. 
Indeed, (0, 1]* < (0, 1] via the function (0.agq,...dj..., O.b9b,...Dj ...) 
0.agboa bh, ... ajb; ..., which is clearly total and 1-1 on the understanding 
that we only utilize infinite expansions. On the other hand, (0, 1] = (0, 1)? via 
x > (x, 1). The result follows from the Cantor-Bernstein theorem. 
Compare the proof just given with the one you gave in Exercise VII.35. 


We next turn our attention to the transfinite sequence of cardinals. 


VII.4.20 Proposition. [f S is a set of cardinals, then |) S is a cardinal. More- 
overa<\JSforallae S. 


Proof. By VI.5.22, J S is an ordinal. We show (see Proposition VII.4.6) that 
Card() S) = US by contradiction. 
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So let Card( S) < US, ie., Card( S) € LU S. By the definition of ), 
Card (U s) eaeS forsomea (1) 


and hence 


Card (U s) <ac U S for some a (2) 


By Theorem VII.4.9, (2) yields a contradiction: 


Card (U s) <a < Card (U s) 


Finally, that a < U S for all a € S follows from Theorem VI.5.22. 


VII.4.21 Corollary. Cn is not a set. 


Proof. If it were a set, then a = |)Cn for some a € Cn. But then a < 
Card(P(a)) € Cn contradicts the previous proposition (which yields 
Card(P(a)) < J Cn = a). 


VII.4.22 Definition. For any cardinal a, its cardinal successor, a* , is the smal- 
lest cardinal > a. 


The above definition makes sense by Corollary VII.4.16 and the remark 
following it, since Cn is well-ordered by < (i.e., €). We can now define the 
alephs: 


VII.4.23 Definition. The aleph transfinite sequence is given by the total func- 
tion @ KF» X, on On defined inductively as follows: 


Xo =O 

Roti = (Ny)t 

No = ity, -B<a} if Lim(@) 
Each &,q is an aleph. 


A cardinal 8g with Lim(q) is called a limit cardinal, while one such as Xy+4 
is called a successor cardinal. 


VII.4.24 Remark. (1) The reader will note that the term “limit cardinal” applies 
to the index of an infinite cardinal in the aleph sequence. It does not refer to the 
cardinal itself (all infinite cardinals are limit ordinals by VII.4.8). 
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(2) If Lim(a), then Ry = U{&g+41 : 8 < a}. Indeed, by VIL4.20, U{&,41 : 
B < a} is acardinal, say 


a=(JiXpe1: 6 <a) 


Since B <a implies 6 + 1 <a, a<q. On the other hand, y € &q implies that, 
for some B < a, y € Xg. But Xg < Ng41 by definition, and Ng+1 1s transitive. 
Thus, y € %g41 © a. © 


VII.4.25 Proposition. The function a +> Xq is strictly increasing. 


Proof. By definition, Sy < Xy+1. The result follows, by VL.5.38. 


qin particular, @ +> Ny is normal. © 


The next theorem shows how to “compute” at. 
VII.4.26 Theorem. For all a, at = {a : Card(a) < a}. 


Proof. Let us set S = {a : Card(a@) < a}. 

First, S is transitive: Indeed, let a € 6 € S. This yields a C £; hence, 
Card(a) < Card(6) < a, the last < by definition of S$. Thus a ¢€ S. 

Second, S is a set: Indeed, if ae S, then a<a*, for otherwise at < 
Card(a) < a. Therefore, S C at, and hence S is a set. It follows that S is 
an ordinal. 

Let next, Card(S) € S. Then Card(Card(S)) < a and therefore Card(S) < a 
(see VII.4.7) which yields S € S, a contradiction. 

Therefore S' is a cardinal. Clearly a < S, as the previous paragraph shows. 
By VII.4.22, at < S. 


By the previous theorem, 8; = {a : Card(a) < w}. That is, 8; is the set of all 
countable ordinals. 

It is also noted that for eacha, (Ny)* < Card(P(X,)), since Xy < Card(P(Xy)) 
by Cantor’s theorem, while (&,,)* is the smallest cardinal above Xq. 

The conjecture (Hausdorff) 


Rot = Card(P(®.)) 


is the generalized continuum hypothesis, or GCH, whereas the special case 
conjectured by Cantor, 


8, = Card(P(Xo)) (1) 


is the continuum hypothesis, or CH. 
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Godel (1938, 1939, 1940) showed, using L, that GCH is consistent with 
the Zermelo-Fraenkel (+ AC) axioms of set theory, and Cohen (1963) showed 
that -GCH is also consistent with ZFC. Thus GCH is independent of the ZFC 
axioms; these axioms can neither prove it nor disprove it. So, as with AC, 
one can adopt either GCH or ~GCH, as an axiom. This is not generally done, 
however. The other axioms of ZFC (including AC) are widely accepted as 
“really true”, being counterparts of reasonable principles (e.g., substitution, 
foundation), whereas our intuition does not help us at all to choose between 
GCH or —=GCH. Our principles (or axioms) are not adequate to settle this 
question, and one hopes that additional intuitively “true” axioms will eventually 
be discovered and added which will settle GCH. It is noted that if one adopts 
GCH for the sake of experimentation, then several things become simpler in 
set theory (e.g., cardinal arithmetic — see Section VII.6), and even the axiom 
of choice becomes a theorem? (the interested reader is referred to Levy (1979, 
p. 190)). 

In the “real realm”, because P(w)~ R (by VII.3.8, VIIL3.11, and Exer- 
cise VII.34), CH can be rephrased to read there is no cardinal between w 
and Card(R), or also every subset of R either has the cardinality of R or is 
countable. 


© Digression. We briefly look into an alternative definition of cardinals, which 
is not based on the axiom of choice. This digression can be skipped without 
harm, as it is not needed for the rest of our development of set theory. In- 
deed, it is incompatible with Definition VII.4.1, which we are following (see 
Remark VII.4.28(3) below). Yet, the reader who is interested in foundational 
questions will find the material here illuminating. 


VII.4.27 Definition (Frege-Russell-Scott). For any set A, its cardinal number 
or cardinality, in symbols Card(A), is the class of all sets of least rank (p) 
equinumerous to A. 

A cardinal is a class which is the cardinal number of some set. 


VII.4.28 Remark. (1) The above definition is essentially the original due to 
Frege and Russell suggested in the preamble to this section, where the “size” of 
cardinals has been drastically reduced down to set size (see VI.6.29). We state 
this as Proposition VII.4.29 below. 

(2) The cardinal number of a set A does not necessarily contain A (i.e., card- 
inals of Definition VII.4.27 are not equivalence classes). To see this, look, for 


1 For this to be non circular cardinals must be introduced in an AC-free manner. See the following 
Digression. 
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example, at Card({@}). By Proposition VI.6.21, op(@)=@+ 1; thus p({w}) = 
w+ 2 (VI.6.24). 

Now let a be any urelement. Then p({a})=1 and {w}~ {a}. Thus {a} e€ 
Card({w}), but {@} ¢ Card({o}). 

(3) Card(@) = {@}, an ordinal. However, if x 4 % then @ ¢ Card(x) ¥ Q, 
since x % Y, i.e., cardinals of nonempty sets are not ordinals. 


VII.4.29 Proposition. For any set x, Card(x) is a set. 


Proof. See V1.6.29. 


@As before, Cn denotes the class of all cardinals. © 


VII.4.30 Proposition. For any sets x and y, x ~ y iff Card(x) = Card(y). 


Proof. If part. Leta € Card(x) = Card(y). Then x ~a~ y. 
Only-if part. Directly from Definition VII.4.27. 


The above is the counterpart of Proposition VII.4.2(b), this time under Def- 
inition VII.4.27. It has been shown by Pincus (1974) that one cannot define 
“Card()” in ZF so that it satisfies x ~ Card(x) as well. 

One now proceeds by adopting Definitions VII.4.10 and VII.4.13 for = 
and <. In particular, Proposition VII.4.11 is derivable. Next, < on cardinals 
is defined through < as in VII.4.31 below. We also observe that if a~a’ and 
b ~ b’, then a <b yields a’ <b’ anda < b yields a’ < b’ (Exercise VII.39). 


VII.4.31 Alternative Definition (Cantor). Card(a) < Card(b) means a < b. 
Card(a) < Card(b) means a = b. 


The above definition embodies the equivalence of (i) and (iii) of Proposi- 
tion VII.4.14. Here Theorem VII.4.9 trivially holds via VII.4.11(ii). “Cantor’s 
theorem” (VII.4.16) also holds. The Cantor-Bernstein theorem is proved by the 
AC-free proof that follows VII.4.17 (p. 463). This yields that < is a partial order 
on Cn. Indeed, irreflexivity is immediate (Card(a) < Card(a) requires a % a). 
Transitivity is obtained as follows: 

Let a < 6 < cand therefore Card(a) < Card(b) < Card(c) for appropriate 
a, band c. By VII.4.31, a ~ b ~ c; hence a xc by Proposition VII.4.11(). If 
a ~ c, then (Exercise VII.39) b ~ a, and hence a ~ b by the Cantor-Bernstein 
theorem. Thus a < c, i.e., a < ¢. 

Proposition VII.4.20 has the following counterpart: 
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VII.4.32 Proposition. [f S is a nonempty set of cardinals, then there is a cardi- 
nal 6 such that a < b forallae S. 


Proof. Let T = {p(a): a € S}. T is a set (of ordinals) by collection; hence |) T 
is an ordinal w such that 6 < a for all 6 € T (Theorem VI.5.22). 
Thus x € a € S implies p(x) < p(a) < a, so that 


x € Vy(a) () 


By(1),a C Vy(q@)foranya € S.Thus 6 = Card(Vy(q@)) will do, by VI.4.11 (ii). 


From the above, one obtains, once again, Corollary VII.4.21: If Cn is a set, 
then let 6 satisfy a < 6 for all a € Cn. Then since Card(P(b6)) € Cn, we get 
Card(P(6)) < b, contradicting Cantor’s theorem. 


VII.4.33 Exercise. Comment on the alternative proof of Proposition VII.4.32 
that proceeds as follows: Represent each a € S as Card(a) for an appropriate 
set a. Let T be the union of all these a’s. T is aset anda C T for each a. Thus 
a = Card(a) < Card(T) by VII.4.11Gi). Therefore, 6 = Card(T) will do. 


The following is the counterpart of Theorem VII.4.26. 


VII.4.34 Theorem. (Hartogs (1915)). For any set x there is an ordinal a such 
thata x. 


Proof. Let S = {B : B <x}. First, by VII.4.11(i), S is transitive. 

We next verify that S is a set and therefore an ordinal, say S = a. To see this, 
consider the class W = {(A, R): AC xAR C Ax Aisa well-ordering of A}. 
Since W C P(x) x P(x x x), W is a set. 

If (A, R) € W, then (A, R) = B for a unique £ via a unique order isomor- 
phism ¢4,r : A— B. Clearly 6 € S by VII.4.11 (iii), and conversely each 6B € S 
and any particular total 1-1 function f : 6 — x induces a well-ordering R on 
A =ran(f) C x, so that 6 = ||(A, R)|| (VI.3.12 and VI.4.33). 

We conclude that the function (A, R) b> ||(A, R)|| : W — S is onto, and 
thus S is a set by collection. If now a = x, then a € a. We must conclude that 


akx. 


We conclude this digression by observing that < on cardinals, as these were 
defined in VII.4.27, is a partial order (see the discussion following Defini- 
tion VII.4.31) but that without the axiom of choice it cannot be shown to be a 


470 VII. Cardinality 


total order. Of course, in the presence of AC one would rather define cardinals, 
as we do, by Definition VII.4.1. 


VII.4.35 Theorem. (Hartogs (1915)). AC is equivalent to the statement “< 
of Definition VII.4.31 is a total order between cardinals, as these were defined 
in VII.4.27”. 


Proof. First assume the statement in quotes and prove AC. 

Let x be any nonempty set and a be such that a <x by Theorem VII.4.34. 
Thus neither Card(@) = Card(x) nor Card(a@) < Card(x). By assumption 
Card(x) < Card(q), i.e., x ~ a, say, via the total 1-1 function f:x—> a. 

f-':ran(f)— x well-orders x by VI.3.12. This proves that every non- 
empty set can be well-ordered, and hence proves AC. 

Assume AC now. By Zermelo’s theorem, if x and y are sets, then x ~ a and 
y ~ 6 for some a and f. Without loss of generality, say a < B. Then a = p. 
Now invoke Exercise VII.39. 


VII.5. Arithmetic on Cardinals 


In set theory and other branches of mathematics one often wants to compute 
the cardinality of a set a which is formed in a particular way from given sets 
whose cardinalities we already know. Failing this, one is often content to at 
least compute an approximation of the cardinality of a, preferably erring on the 
high side. This section develops some tools to carry out such computations. 


VII.5.1 Definition. For any cardinals a and b, a +, 6 stands for Card({0} x 
aU {1} x b), their sum. 


The sum operation is denoted by the cumbersome +, to avoid confusion 
with ordinal addition. 


VII.5.2 Proposition. /f A and B are disjoint sets, then Card(A) +, Card(B) = 
Card(A U B). 


Proof. Let a= Card(A) and b = Card(B). Since a ~ {0} x a viax + (0, x) and 
b ~ {1} x 6b viax b (1, x), there are 1-1 correspondences f : A > {0} x aand 
g: B= {1}xb. Since AN B = {0} x aN {1} x 6b = G, it follows that f Ugisa 
1-1 correspondence and AUB ~ {0} x aU {1} x 6; thusa+, 6 = Card(AU B). 


oe 
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VII.5.3 Corollary. For any sets A and B, Card(AUB) < Card(A) +, Card(B). 


Proof. The function f : {0} x AU {1} x B ~ AUB given by (i, x) b x is 
onto. The claim now follows from VII.5.2 and VII.4.14(i). 


VIL5.4 Remark. By VII.5.2, to compute a +, 6 it suffices to compute 
Card(AU B) for any disjoint A and B that have cardinalities a and 6 respectively. 
This observation proves to be very convenient in practice. © 


VII.5.5 Example. We verify that w +, @ = w. Indeed, observe that w ~ E and 
w ~ O, where E and O are the even and odd natural numbers respectively, and 
apply VIL.5.2. 

It is important to observe that +, (cardinal addition) is different than + on 


ordinals (ow 4 w+ a). © 


VII.5.6 Example (Informal). We verify that w +. Card(R) = Card(R).i In- 
deed, (0, 1) Nm = @ and (0, 1) ~ R (Exercise VII.34). Thus 


w +, Card(R) = Card(w U (0, 1)) < Card(R) (1) 


where the = follows from VII.5.2 and the < from w U (0, 1) C R. Similarly, 
Card(R) < Card(m U (0, 1)) by Exercise VII.34 and (0, 1) x w U (0, 1). This 
and (1) establish the claim. Alternatively, @ U (0, 1) ~ (0, 1) by VII.2.24. 


The basic properties of addition, worked out by Cantor, are captured by the 
following theorem. 


VIL.5.7 Proposition (Cantor). For any cardinals the following hold: 


(ij)a+.0=a 
(ii) a+.b6=b6+.a 
(iii) (a+. 6) +. ¢ = at, (6+, ©) 
(iv) Ifa <b, thena+.c<b+.c¢ 
(v) a < bifffor some c,b=a+-ec. 


Proof. (i)—(iii): by VIL5.4. 

For (iv), let A, B,C be mutually disjoint sets such that a = Card(A), 
b =Card(B), c=Card(C). By VII.4.14, there is an onto f:B— A. Then 
f Ui:BUC— AUC isan onto function, where i : C — C is the identity. 


} The cardinality of the set of real numbers is often denoted by c in the literature (“c” stands for 
“continuum’”). 
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For (v), let a < 6 and A, B be as above; hence (VII.4.14) there is a total, 1-1 
f :A— B.LetC = B—ran(f) (this might be empty). Set ce = Card(C). Now, 
C Nran(f) = @, and Card(ran(f)) = a. Thus, a+, ¢ = Card(ran(f) UC) = 6b. 
This settles the only-if part. For the if part start with AMC = @ such that 
a = Card(A), c = Card(C), and set B = A UC, b = Card(B) = a4, c. Since 
i: A C B (the inclusion map, given by i(x) = x for all x € A) is total and 1-1, 
we get a < b by VIL4.14. 


VII.5.8 Proposition. +. | @? = + | @”. 


Proof. m+n =mU{m+k:k € n} by V.1.25 (or VL10.4). The function 
fina {m+k:k €n}, given by f(k) = m+k, isa 1-1 correspondence 
(1-1-ness by VI.10.2). Thus, 


m+n = Card(m + n) sincem+n €@ 
= Card(m) +, Card({m +k :k €n}) 
=m-+en 


We next turn to the multiplication of cardinals. The definition is motivated 
from the intuitive observation that if |A| = n and|B| = m,then|Ax B| = m-n. 
The validity of this observation will be formally verified below. 


VIL.5.9 Definition. For any cardinals a and 6, a -, 6 stands for Card(a x 6), 
their product. 


The cumbersome “-.” for cardinal multiplication is used to distinguish this 
operation from “-”, ordinal multiplication. We prove the analogue of VII.5.2 
first: 


VII.5.10 Proposition. /fa = A and b = B, thena-. 6 = Card(A x B). 


Proof. Let f :a— Aand g:6— B be 1-1 correspondences. It follows that 
Ayd.( f(y), g(d)) is a 1-1 correspondence a x b > A x B. 


ene Example. In view of VII.2.9, @-.@=@ < @- a; thus 


a--bAa-b, in general. 


See however below, the case of finite cardinals. © 


VII.5.12 Proposition. -. | @* = - | a”. 
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Proof. We have 


m-n = Card(m - n) sincem:-n €w 
= Card(n x m) by VI.10.18 
= Card(m x n) via (k, 1) +> (1, k) 
=m-n by VII.5.9 


VII.5.13 Proposition (Cantor). For any cardinals, the following hold: 


(i) a--0=0 

(Gi) a--l=a 
(iii) a--b =b- a 

(iv) (A-¢ b)-¢ C= A-¢ (b ¢ €) 
(v) Ifa<b, thena-.c<b-c¢ 
(vi) a-(6+,.0e)=a--b+ac. 


Proof. We apply VIL5.10 throughout. Thus, (7) follows from A x J = @. 
(ii) follows from the fact that x b> (x, 0) is a 1-1 correspondence A —> A x 1. 
For (ii) note that Ax B ~ Bx A via (x, y) > (y, x). (iv) is aconsequence of 
(Ax B)xC ~ Ax (Bx C) via (x, y, Z) b> (x, (y, Z)) (recall that (x, y, z) = 
(X,Y), Z)). 

For (v), let f : A > B be 1-1 and total. Then (x, y) (f(x), y) is 1-1 and 
toalAxC—> BxC. 


Finally, for (vi), take 6 = Card(B) and ¢ = Card(C) with BN C = @, and 
note that A x (B UC)=(A x B) U(A x C) and that (A x B)N (A x C)=9. 


The following result, along with the Cantor-Bernstein theorem, assists in 
computations where +, and -, are involved. It shows that cardinal addition and 
multiplication are nowhere near as rich as their ordinal counterparts. 


VII.5.14 Theorem. For any a>, a--a= a. 


Proof. Leta > w be the smallest cardinal for which the claim fails. Then a > w 
by VIL5.11. Leta x a = B, via the J of Section VI.7, ie., J[a x a] = B. We 
know that J[a x a] > a by Exercise VI.30, so a < 6; therefore 


a= a(y, 5)) for some (y,6) Eaxa (1) 
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Since Lim(a) (by VII.4.8), take a A < a to satisfy also max(y, 5) < 4. Thus, 
the isomorphism in (1) establishes a = A x 4. Therefore 

a = Card(a) < Card(A x 4) = Card(A) -. Card(A) = Card(A) (2) 


the last “=” by minimality of a, and Card(A) < A < a (using VII.4.6). We now 
have a contradiction a < Card(A) < a. 


@h side effect of the proof is that J[Xy xX &y] = Xz, for all a. 
VIL5.15 Corollary. For anya > w,a+.a=a. 


Proof. Using the arithmetic of VIL5.13, 


at-a=a--l+.a-.l 
=a--(1 +. 1) 

a--(1+1) 

a-.2 

a-ca by VII.5.13(@i7) and (v) 

a 


I IA Il 


But we also have a < a+, a. 


VII.5.16 Example. Constructions in mathematics often result in a family of 
sets (A;);¢7 where we have the “estimates” of cardinalities 


Card(A;) <a forallie I (1) 
and 
Card(/) < b (2) 
We can then estimate that 
Card (U 4’ <a-.b6 (3) 
iel 
Indeed, using AC, pick for each i € 7 an onto fj : a > A;, which is legitimate 
by (1). Define g: a x I > U;., Ai by 
g(y,i) = fily) forallyea, iel 


Now g is onto; hence 


iel 


Card (U a) < Card(a x I) = a+, Card(I) < a+, 6 


That is (3). 
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© Apart from our use of AC in the definition of cardinals (through the well- 
ordering theorem), the above result also invoked AC (twice) additionally (why 
twice?). AC can be avoided if we are content to prove instead: “Assume that 
cardinals were defined without AC, say as in VII.4.27. Now, assume (1) and (2) 
above, and moreover let J and (),.; A; be well-orderable. Then (3) follows 
within ZF.” 


Indeed, let w = ||(U;<; Ai, <1)|| with respect to some arbitrarily chosen well- 
ordering <j, of this set. Let us also pick a well-ordering <2 of J. Define for 
each x € Uje, Ai 


iel 


f@m= (iy), where i = (<2-min){j ¢ J: x € Aj} and 
Y = II(Ki (x), <DIl 


Clearly, f : U;-; Ai > I x @ is total and 1-1. Hence, 


Card (U 4) < Card x a) = Card(/) -. Card(a) = b-.a=a-.6 oe 


iel 


VII.5.17 Example. Here is a situation where we may want to use the technique 
of the previous example: We have a first order language of logic, where the set of 
nonlogical symbols has cardinality &. How “many” formulas can this language 
have? Well, no more than strings over the alphabet of the language. Now the 
cardinality of the alphabet, L, is 


wo if€<o 


ES tee i otherwise 


where w is the cardinality of the set of logical symbols (assuming the object 
variables are vg, v1, ...). 


A “string” of lengthn < wis, of course, a member of L”. An easy induction 
on n, via VII.5.14, shows that 


wo ift€<o 


ee ae ( otherwise 


Thus, the set of all strings over L, U L”, has cardinality 


new 


O:-o=o ift<a 
n < c = 
Card (Ue aes ae otherwise 


new 


Can you sharpen the < into =? 


We define, finally, cardinal exponentiation. This again turns out to be far too 
“easy” by comparison with ordinal exponentiation. 
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VII.5.18 Definition. For any a, 6, a® denotes Card(®a). 


First, let us give the analogues of VII.5.2 and VIL5.10. 


VII.5.19 Proposition. Jf a = Card(A) and 6 = Card(B), then Card(®a) = 
Card(® A). 


Proof. Let f :a— Aandg:b6b — Bbe 1-1 correspondences. As the following 
commutative diagram (cf. V.3.11) shows, F : °a > 8A given by 


F(h) = f ohog™ 


is a 1-1 correspondence, with inverse Ak. f~!okog:%A— °a: 


VII.5.20 Remark. By VII.3.9, 2card(A) — Card(42) = Card(P(A)) for all 
sets A. 


In particular, 


20 = Card(P(w)) 
=c, by VIL3.8 


VIL.5.21 Proposition (Cantor). Cardinal exponentiation obeys the following: 


(i) a =1 
(ii) a'=a 
Gi) ae aed 
(iv) (a®)' = alte? 
(v) a* < 6 whenever a < 6. 


Proof. (i): The empty function 0 is the only member of °a; hence Card(a) = 1. 
For (ii), the set of total functions f:1 —> ais 'a = {{(0, y)}: y < a}. Thus 
Card('a) = a, via the 1-1 correspondence {(0, y)} + y. 
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(iii): Let € = Card(K), [ = Card(L), a = Card(A), where KN L = @. 
The reader can readily verify that f H (f | K, f | L) isa 1-1 correspondence 


KULAW~ KALA, 


(iv): Let K, L, A be as in (iii) (although K 1 L = @ is not required here). 
We need a 1-1 correspondence ‘**")4 ~ 4(* A), The function that maps 


Axy. f(x, y) € A tody.(ax. f(x, y)) € ©(* A) fills the bill. 
(v): Since a C b, F : §a > *b given by F(g) = g is total and 1-1. 


The next result shows that cardinal exponentiation coincides with ordinal 


exponentiation over w, just like the addition and multiplication. 


VII.5.22 Proposition. For all m,n in w, m” =m". 


Proof. Induction on n. For n = 0 we have m® = 1 = m’, the last equality 
by VII.5.21. Assume the claim for some frozen n, and proceed to n + 1. 


mt) —m".m by VL10.23 
=m"..m by LH. and VIL5.12 
=m"-.m' by VIL5.21 
=m"! py VILS.21 

=m"*!, by VIL5.8 


cote Example. How big is 8°? Well, this is Card(’a); therefore 


c = Card(®2) 
< Card(?w) since °2 C ®w 
< Card( P(@ x o)) since °w C P(w X w) 


= Card(®*®2) by VIL3.9 
=c by VIL5.20 and VII.2.9 


Thus, nie = ¢, 


VII.5.24 Example. For any € w—{0} and set A, wehave A” ~ "A viathe 1-1 
correspondence (x9, ..-Xn—-1) b {(i, x;) : i € n}. Thus Card(” A) = Card(A”). 


In particular, a? = Card?a) = Card(a x a) = a-. a for any a. 


exe Remark. We saw in the discussion following VII.4.26 (p. 466) that 


(Ra)? < Card(P(®a)) 


© 


© 
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or 
(Nw)? < Card(®2) = 2*« 
Thus, 
(1) an alternative formulation of GCH is 
Rati = 2 


and 
(2) we have just estimated at in general: 


at < 2° 


VII.6. Cofinality; More Cardinal Arithmetic; Inaccessible Cardinals 


How far can we “stretch” an ordinal a by applying to it a function f? That is, 
given @ anda total function f : «a — On, how “big” can the elements of ran( f) 
be? Well, let 6 be arbitrary. Define f(0) = 6. Thus 1 = {0} is stretched, by f, 
to the arbitrarily large value 6. Clearly an uninspiring answer. 

A much better question to ask, which leads to fruitful answers, is: how far 
can we shrink a given ordinal a by some total function, f? That is, what is the 
smallest B such that f : 8 — a and ran(f) “spreads as far to the right” in a as 
possible? 

More precisely, let us define 


VII.6.1 Definition (Cofinal Subsets). Let @ 4 A C B, and < be an order on B 
(hence also on A). We say that A is cofinal in B just in case for every be B 
there is ana € A such that b <a, where, of course, x < ymeansx <yVx=y. 


The set A above “spreads as far to the right in B as possible”. If 6 C a, then 
B cannot be cofinal in a in the above sense, unless 6 = a. For this reason we 
have a somewhat different notion of “cofinal in” for ordinals. 


VIL.6.2 Definition. 6 is cofinal ina ¥ 0 just in case there is a total function 
f : 6 — a such that ran(f) is cofinal in w in the sense of VII.6.1. We say that 
Ff maps 6 cofinally into a, and that f : 6 — a@ is acofinal map (function). 
The cofinality of an ordinal a, cf(a), is the smallest ordinal that is cofinal 
ina. 
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Thus, cf(a@) is the smallest ordinal into which we can “shrink” @ via a (total) 
function. If Lim(@), and f : 8 — a is cofinal, then ran(f) is unbounded in a 
(hence sup ran( f) = J ran(f) = @), since y <a implies y + 1 <a, and hence 
y<yv+1< f(o) for someo € B.i 

From the preamble to the section it follows that |= cf(@ + 1) for any a. 
Also, w= cf(m), since for each n€ qm and f:n— a, ran(f) is finite (Exer- 
cise VII.18); hence (Jran(f)€w, and thus, for all n€ a, cf(w)4n. Also, 
by VII.4.23, cf(&,.) =; therefore some “huge” ordinals (in this case a car- 
dinal) can shrink quite a bit. 

Finally, it is clear that cf(a) < a, since, whenever Lim(q@), the identity 
function maps @ cofinally into a, while whena = 6 + 1, cf(a)=1 <a. © 


VII.6.3 Definition (Hausdorff). An ordinal @ is regular provided cf(a) = a. 
Otherwise, it is singular. 


Thus, | and @ are regular, all n > 1 in w are singular, and so is Xy.! 
VII.6.4 Proposition. For any a, cf(a) is a cardinal. 


Proof. If a is a successor then the result is trivial. Otherwise, let 1 < 6 < cf(a) 
and g : B — cf(@) be a 1-1 correspondence. Suppose that f maps cf(a) to a 
cofinally. Thus, _) ran(f) = a. Clearly Jran(f o g) = @ as well, thus f og 
maps f cofinally into a, contradicting the minimality of cf(q). 


Thus, all regular ordinals are cardinals. In particular, all, except 1, are limit 


ordinals. © 


VIL6.5 Proposition. There is a total order-preserving cofinal map 
f: cia) a. 


Proof. The result is trivial if w is a successor. 


Let then Lim(a@), and g : cf(aw) — o be such that (J ran(g) = a. Define f 
by recursion for all 6 € cf(@) 


f(B) = g(min{o € cf(a): Wy < B)glo) > f(y) U 8(B) () 


+ Some authors offer the definition only for Lim(@). 

= If we allowed nontotal cofinal maps, then the modified definition would make 0 regular as well 
(Hausdorff). There is no universal agreement in the literature (see also previous footnote) on this 
point. 
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To see that f is total on cf(@), we argue that, for 6 € cf(@), 
{o ecf(a): (Vy < B)g(o) > flv) Us(B)} #G (2) 
By (1), ran(f) C a. Thus, (2) follows, since 


(i) ran(g) is unbounded in a, 
(ii) supran(f | B) < a, since B < cf(a). 


Again by (1), f(v) < f(B) if y < 6, so that f is order-preserving. 
Finally, t € a implies t < g(a) for some o € cf(a@) by (1). By (1), g(a) < 
f(o); hence t € (Jran(f); therefore a C Jran(f); thus a = (Jran(f). 


VII.6.6 Proposition. Let the order-preserving function f map o cofinally into 
B > a. Then cf(a) = cf(). 


Proof. Let g : cf(a) — a@ be a cofinal map. Then, so is f og : cf(a) > 8B. 
Indeed, let y € 8. By cofinality of f, f(a)>y for some o €a. Moreover 
(cofinality of g) g(€) => o (for some & € cf(@)). Since f is order-preserving, 
f(g) = f(o) = y. We conclude that 


cf(B) < cf(@) (1) 
If a is a successor, then cf(~) = | and the result follows from (1). 
So let Lim(q@), andleth : cf(6) — £ becofinal. Define afunction F : B > a 


by 


—1 : 
F(y)= ae a for all y € B 


minféea: f()>y} ify ¢ran(f) 
F is total, for the min in the bottom case always exists. Indeed, given y, there 
isao € a such thaty < f(o) < f(o+ 1) [o + 1 € a by Lim(a@)]. Hence 
(4d € a) f (6) > y. Moreover, 


F(y) < F(6) whenever ran(f) 3 y <éd€ B (2) 


Indeed, (2) is immediate if 5 € ran(f) as well, since f is order-preserving. If, 
on the other hand, 6 ¢ ran(f), then let o € w be minimum such that f(a) > 6. 
Thus, f(o) > 6 > y = f(n) for some n € a, and F(6) =o0 > n= F(y). 

Finally, F o h: cf(B)—a is cofinal. Indeed, let yea and asd>y 
(by Lim(q@)). Therefore f(6) > f(y). By cofinality of h, take h(o) > f(64), for 
some o € cf(B) (strict inequality by Lim(8) — why Lim(8)?). It follows, us- 
ing (2), that F(h(o)) > F(f(6))=46>y. 

Thus cf(a@) < cf(B), and we are done by (1). 
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VII.6.7 Corollary. For any a, cf(cf(a)) = cf(a). 


Proof. \f a is regular, then the result is trivial. Otherwise, use VII.6.5 and 
VIL6.6. 


ei for all a, cf(@) is a regular cardinal. © 


VII.6.8 Corollary. [f F is normal, then cf(F(a)) = cf(a) for all limit ordi- 
nals a. 


Proof. Since F is order-preserving, F(a) > a. If we have equality, the result 
is trivial. So let F(a) > a. The map 


a> Be F(B)€ F(a) 


is cofinal by normality. The result then follows from VIL.6.6. 


VIL.6.9 Corollary. For all Lim(a), cf(®w) = cf(a). 


Proof. a& +> &q is normal. 


One often encounters, and accepts as common sense, the following 
statement: “Let X C eae A, andXbe finite. Then, for some méa, 
XC Un em An.” This is a special case of the following. 


VII.6.10 Proposition. /f m > w is regular, Card(X) < m, and X C LU, 4 Aw 
then X CU, _, Ax for some k < m. 


Proof. Let Card(X) = n < m, and g : n > X bea 1-1 correspondence. Define 
f:n—> mby 


f(o) = min {« em: g(a)¢U a 
A<T 


If the conclusion is false, then f maps n cofinally into m, contradicting the 
regularity of the latter. 


VII.6.11 Proposition. The infinite cardinal a is singular iff, for some B <a 
and a family of sets (Aq)a<g with Card(A,) < a for all sets in the family, one 
has a= Card(U, <p Ag). 
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Proof. Only-if part. Let B = cf(a)<a, and f:6— a be cofinal. Thus a= 
Uvep f(a). The result follows using Ay = f(a). Of course, f(a) < a; hence 
Card( f(a)) < f(a) <a. 

If part. Suppose that a = Card(J,- g Aa), where f is the smallest ordinal that 
satisfies the hypotheses above. For eacha € B set f(a) = Card(U,, <0 A,). As 
1S ieee Ay & veg Ag, it follows that Card(U, <0 Ay) < Card(U yp Ag) = a. 
The < graduates to < by minimality of 8; hence f(a) € a (€ is, of course, <). 
Thus we have a function f : B > a. 

By VIL4.20, ¢ = Se f(a) = supyg f(a) is a cardinal. Clearly, ¢ < a. 
Can the inequality be strict? Well, if it can, then (using VII.5.16) 


a = Card (U 4.) 


a<pB 


= Card (U U “,) 


a<B y<a 


<c-, Card(6) = max{c, Card(B)} < a 


—acontradiction. Thus, c= a and f is cofinal. Since 6 <a, the result follows. 


(1) The above proposition can be rephrased to read exactly as above, but with 
B replaced by a cardinal b < a. In the only-if part this is so because cf(a) 
is a cardinal. In the if part it is so because the smallest ordinal 6 that makes 
the proof work is cf(a). But this is a cardinal. 

(2) As the notions “singular” and “regular” pertain to ordinals, the remark 
following VII.5.16 applies here, so that VII.6.11 is provable within ZF on 
the assumption a is well-orderable. Remarks such as that and the present 
one are only of value when one wants to gauge with accuracy which results 
follow, or do not follow, from what axioms. 


VII.6.12 Proposition. Every infinite successor cardinal is regular. 
Proof. Let @ <a, and assume instead that « = cf(at)<at (xk is a cardinal 


by VII.6.4). Let f:«—at™ be cofinal. We observe that f(8)<a*, hence 
Card(f(8)) < a, for all 6 € «. Thus, using VII.5.16, 


at = (J £(B) 
B<k 
Ska 
<da-a=a 


which is a contradiction. 
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It is known that without AC we cannot prove (in ZF) that &; is regular (Feferman 
and Levy (1963)). One cannot even prove (in ZF) that there are any infinite 
regular cardinals at all beyond w (Gitik (1980)). 


Let us next turn our attention to regular limit cardinals beyond w. These have 
a special name. 


VII.6.13 Definition. A cardinal a > Xo is weakly inaccessible iff it is regular 
and a limit (i.e., for some @ with Lim(q@), a = Xq). 


By VII.6.9, if Lim(@) and cf(®q) = Xa, then cf(a) = Ry => a@ (the inequality 
holds by normality of &). Hence 


ga () 


since cf(a) < a. So, what is the first fixed point a of 8? By VI.5.42—VL5.43, 
that will be a = sup{s, :n < w}, where 


so = 0 
Sntl = &s, forn > 0 
This @ is quite huge, namely, 
Nx 
me 
On the other hand, cf(@) = w for this a, since n b> s, is cofinal. Thus cf(%q) = 
cf(a) = w < Xq. We have just established that the first fixed point of 8, a huge 
limit cardinal, is singular. As this was only the first candidate for a weakly 
inaccessible cardinal, the first actual such cardinal will be even bigger, as it 
must occur later in the aleph sequence. 
It turns out that within ZFC one cannot prove that weak inaccessibles exist. 
We will prove this relatively easy metamathematical fact below, but first we will 


need a notion of strongly inaccessible cardinals and some additional cardinal 
arithmetic tools. 


VII.6.14 Definition. A cardinal ais strongly inaccessible, or just inaccessible, 
iff it is weakly inaccessible and, moreover, for every infinite cardinal b < a, 
2 ara. 


(1) Any cardinal a that satisfies b < a > 2° < ais called a strong limit. 

(2) The above definition could also be phrased: “A cardinal a > Xo is strongly 
inaccessible, or just inaccessible, iff it is regular and, moreover, for every infinite 
cardinal 6 < a,2° < a.” This is because for any b, b+ < 2°: thus the “moreover” 


© 
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part yields the implication b < a > b* < a; hence ais a limit cardinal and 
therefore, in particular, weakly inaccessible. 


In the presence of the generalized continuum hypothesis (GCH) that 2° = 6+ 
for all infinite cardinals, the requirement in VII.6.14 that 6 < a implies 2° < a 
is automatically satisfied, since a is a limit cardinal. Thus, under GCH, weak 
and strong inaccessibles coincide. 


A strongly inaccessible in comparison with other (smaller) infinite cardinals 
is like m in comparison with smaller cardinals (natural numbers), since n € w 


implies 2” € w.i © 


VII.6.15 Definition (Generalized Cardinal Addition). Let (€;);<; be a family 
of cardinals. Their sum, }°;., €;, is defined to be Card(U;.; {i} x €). 


iel 


Intuitively, in the sum we “count” all the elements in all the €; and allow for 
multiplicity of occurrence as well, since if €; = €;, still {i} x & A{j} x €; =. 


VII.6.16 Remark. The above definition, a straightforward generalization of 
Definition VII.5.1, does not need AC, so it can be effected within ZF (with an 
appropriate definition of cardinals that also avoids AC). In ZFC it is equivalent to 
the commonly given statement (definitionally): “}7,_, €; equals Card(\;.; Ki), 
where K; 1 K; = @ whenever i j, and ¢; = Card(K;) for alli € 1.” 


Part of this is due to the obvious €; = Card({i} x €;). The rest (including the 
immunity of the statement to the choice of K;) follows from AC (the details 
will be left to the reader: Exercise VII.59). © 


An interesting phenomenon occurs in connection with the above remark: 
Vier, No (that is, °c, & where every €; is No) equals Card(U;<x, {4} x So) = 
Card(83) = Xo. 

On the other hand, as we have noted a number of times before, Feferman 
and Levy (1963) have shown that, in the absence of AC, it is possible to 
have a countable family of mutually disjoint countable sets (K;);<7 such that 
Card(U;<; Ki) = 28o > No. In lay terms, the “sum of the parts” can be signifi- 
cantly less than the “whole”, without AC. © 


VII.6.17 Proposition (Multiplication as Repeated Addition). For any card- 
inals aand b,a--6 = >-,.,6 


aea ** 


i “9” in the sense of Chapter V. This is the same as 2”. 
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Proof. 


)\b = Card (Ue x ) 


aea aea 


= Card(a x b) =a-.b6 


VII.6.18 Definition (Generalized Cardinal Multiplication). Let (€;);<; be a 
family of cardinals. Their product is defined to be Card([ ];-; €). 


We prefer not to propose a symbol for the product of a family of cardinal 
ic, ti would be in- 
appropriate, as it already indicates something else: the Cartesian product of 
the e; . 


If all the €; are equal to m, and Card(/) = a, then Card([];<; €;) = Card(/m) = 
: g 


VII.6.19 Remark. If Aj; ~ B; fori € J, then [];-, Ai ~ [];<, Bi (Exer- 
cise VII.64). 


numbers, as there is no universal agreement. Of course, II 


VII.6.20 Lemma (KOnig). /fa; < 6; for alli € I, then 
s a; < Card (1 s] 
iel ie] 

Proof. Set A; = {i} x a;; thus 
Se p= (Card (U a) 
ie. ie. 

Card(A;) = a; forie I 


and the A; are pairwise disjoint. 
We need to show that there is no onto function 
g: U Ai > I] b; 
ie] ie] 
Suppose otherwise. Set, for eachi € I, 


B; = g[Ail, the image of A; under g (1) 
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Thus, 
[Jo =U32 
iel iel 
We next project B; along the ith coordinate to get 
P; = {p@): p € Bi} (2) 
By (1), and the onto map B; 3 pt p(i) € P,, 


Card(P;) < Card(B;) 
< Card(A;) 
< 6; 


Thus, P; C 6; (P; © 6;, by (2)). Now, using AC, define a total p on J with 
pti) € 6; — P; foralli € 7. Clearly, p € Tey b;, yet p cannot be in ran(g), for, if 
so, p € B; for some i € J, and hence p(i) € P;; a contradiction. Thus, g cannot 
be onto; a contradiction. 


K6nig’s lemma extends Cantor’s diagonalization that lies behind Cantor’s 
theorem. Indeed, a = )>,., 1 < Card([[,<,2) = 2°. © 


VII.6.21 Corollary. For all infinite cardinals a, a < a". 


Proof. Let f : cf(a) > abe cofinal. Then 


a= |) f(s) 


B<cf(a) 


< > Card(f(8)) by Exercise VII.60 
B<cf(a) 


< cana ( I] :) since Card(f(B)) < f(6) <a 


B<cf(a) 
cf(a) 


=a 


VII.6.22 Corollary. cf(2*°) > Xo. 
Indeed, cf(2*«) > &, for any w (see Exercise VII.65). 


Proof. If we have equality, then (by VII.6.21) we get 


Qo < (2X0 yo — QRo-cXo — QXo 
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ein the absence of the CH, ZFC cannot pinpoint the cardinal 2*° in the aleph 

sequence with any certainty. If 28° = &,, then fine. But if not, then it can be 
(i.e., it is consistent with ZFC), as Cohen forcing has shown, that 280 = & or 
280 = &3, or, indeed, that 2*° is weakly inaccessible provided existence of such 
inaccessibles is consistent with ZFC. 


However, we know that 2*° 4 &,, by VII.6.22, since cf(X&,,) = Xo. 


VII.6.23 Lemma. /fm < cf(a), then 


a™ = 5° Card(a)™ 


a<a 


Proof. Trivially, 
a ha (1) 


Let next f € ™a. By the assumption, sup ran(f) < a; hence f € U,_, "a. Thus 
(1) is promoted to equality. 


Next, 


a™ = Card (U *s) 


a<a 
<)> Card(a)™ by Exercise VII.60 
a<a 
= a-¢ (_) Card(a)™ by Exercise VII.62 
a<a 
=a™ since Card(a)™ < a™ for alla <a. 


VII.6.24 Corollary. [fa is regular and m < a, then 


a™ = ) °Card(a)™ 


a<a 


VII.6.25 Corollary (Hausdorff). For all a, B, 


Np 


Np 
Rol = Xa “c Roti 


© 
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Proof. For B < a we apply VII.6.24 (see also VII.6.12 and VII.5.21) to obtain 


x 
yo > Card(y )** (note the “<” 
ySRa 
x 
= De 
ySRa 
x 
= Ro "ce Rol 
x 
= ae “c Rat 
Ns 
a Rol 


For a < f (hence also a + 1 < B) use Exercise VII.56 to obtain 


Np Re 8 
Bout = Re = 2" 
Hence, from Xg+1 < 8g < 2** the contention becomes the following provable 


statement: 


2 “c Rot 


We conclude this excursion into “higher arithmetic” by noting how the 
adoption of GCH helps to further simplify cardinal arithmetic (in particular, 
exponentiation). 


VII.6.26 Proposition. [f we adopt GCH, then 


(i) Ne = Npii ifa < B, 
(ii) Ra" = Ra41 if Ro > Np = cf(Xa), 
(iii) Na’ = No if cia) > Np. 


Proof. (i): 
ni? = 28s by Exercise VII.56 
=> Ne41 by GCH 
(ii): 

Rot = NR by (i) 
> rv by Exercise VII.57 
> NcfR) by Exercise VII.57 

> Ry by VII.6.21 


Thus, x5? = Rott: 
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(iii): 


a’ = > Card(y)* by VII.6.23 
y<Ra 
= Xa -c sup Card(y)** by Exercise VII.62 
y<Ra 


Now, for any y < Xa, 


Card(y)** = Cardy) 


< Card(P(X, x y)) since **y C P(Xz x y) 
= Cards xy) 
= Card(Xg x y)* by GCH 


= (Rp *¢ Card(y))* 
= ( max(X,, Card(y)))* 
< Ra 


x fet 
Thus, Xo” = XN, in this case. 


We conclude this section by pondering the existence of inaccessibles. 


VII.6.27 Lemma. /f a is strongly inaccessible and Card(N) < a, where N is 
some arbitrarily chosen set of urelements, then Card(Vy(a@)) = a. 


Proof. Since a C Vy(a) by V1.6.8, 
a = Card(a) < Card(Vy(@)) (1) 
Since Lim(a@), Vy (a) = ia Vn (6). Thus, 
Card(Vy(a)) < ae Card(Vy(B)) < @ -¢ sup(Card(Viy(B))) (2) 
B<a 
To conclude, by induction on 8, we show that Card(Vy(6)) < @ forall B < a. 
If 6 = 0, then Card(Vy(6)) < @ from the choice of NV. 
If 6 = y + 1, then 


Card(Vy(y + 1)) = Card(P(N U Vu(y))) 
< max(Card(Viy (7), Card(N)) 


<a, by the I.H. and @’s “strong limit” property. 


If Lim(f), then Vy(6) = Se Vy (y). By the LH., Card(Vy(y)) < a for 
y < B; thus 


Card(Vy(B)) < a, by the IH. and VII.6.11, since B < a. 


Thus, (2) yields Card(Vj(a@)) < a, and the result follows from (1). 
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VII.6.28 Lemma. /f a is strongly inaccessible and Card(N) < a, where N 
is some arbitrarily chosen set of urelements, then A © Vy(a) and Card(A) < 
Card(Vy(@)) imply A € Vn(q@). 


Proof. Now, Vy(a) = Oe Vy (B), Card(Vy(@)) = a, and @ is regular. 
By VII.6.10, A C Bsr Vy (6), where m < a. Thus, A € Vy(m+ 1) (“*+ 1” 
in the ordinal sense); hence A € Vy(q@). 


VII.6.29 Lemma. /f a is strongly inaccessible and Card(N) < a, where N is 
some arbitrarily chosen set of urelements, then for a set A, A € Vy(q@) implies 


Card(A) < a. 


Proof. A € Vy(a@) implies A € Vy(B) for 6B < a, and hence A C N U Vy (6). 
Thus, Card(A) < Card(V) +, Card(Vy(B)) <a, by the assumption (on NV) and 
by the proof of VII.6.27. 


VII.6.30 Theorem. /f a is strongly inaccessible and Card(N) < a, where N 
is some arbitrarily chosen set of urelements, then Vy(a@) is a formal model of 


ZFC. 


Proof. The proof is very similar to the proof that N U WFy is a model of 
ZFC (VI.6.13). Cf. also Sections V1.8 and VI.9. We prove that J = (Lse, ZFC, 
Vy(@)) is a formal model of ZFC. The verification then entails establishing 
tzpc .4""™ for the universal closure of each ZFC axiom. 


Observe at the outset that Vy(a) = N U Vy(q@); hence it is transitive. It is 
also nonempty (why?) 


(i) The axiom “(Ax)(Vy)(U(y) = y € x)’ relativizes to “(Ax € Vy(a))(Vy € 
Vu(a)(U(y) <> y € x)”. This is derivable (take x = N and apply 
substitution axiom). 

(ii) The axiom “(Vx)(U(x)—> (Vy)y ¢x)” relativizes to “(Vx € Vy(a)) 
(U(x) > (Vy € Vy(@))y € x)” and is a trivial consequence of the unrela- 
tivized version. 

(111) Axiom of extensionality. Derivable by VI.8.10. 

(iv) Axiom of separation. It says that for any set B and class A, A C B im- 
plies that A is a set. To see why the relativization of this is derivable, let 
B € Vy(q) and A C B. By ZFC separation, A is a set. We need to prove 
that (A is a set)"", that is, A € Vy(a). Well, we have A C Vy(a) 
by ACB and transitivity. Since (VII.6.29) Card(B)<a, we have 
Card(A) < @ and are done by VII.6.28. 
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(v) Axiom of foundation. Holds by VI1.8.11. 

(vi) Axiom of pairing. For any a, b in Vy(a@) we must show the derivability of 
((Ay)y = {a, b})"™™. By VL8.13 we need only show that fa, b}""™ ¢ 
Vy (q@), or that {a, b} € Vy(a), by VL8.2. Well, 


p({a, b}) = max(p(a), p(b)) + 1 < a 


and this settles it.' 
(vii) Axiom of union. For any set of sets A € Vxy(a) we need to show that 


(any=Ua)y” 


By VI.8.13, we need only show (using VI.8.16) that _) A € Vi(@). Well, 

let A € Vy(B) with B < a.Then|J A C NUVj(f), since x € A implies 

x € NUVy() and thus x C NUVy(). So p(L) A) < max(0, B)+1 <a. 
(viii) Power set axiom. We need to show that for any set A € Vy(q@), 


((y)y = P(A)" 
or that (VI.8.13) 
P(A) € Vy(a@) 


By absoluteness of C, PY” (A) = P(A)N Vy(a) = P(A), the last equal- 
ity because x C A ©€ Vy(B) (B < a) implies x C A C NU VWn(B), 
and hence x € Vy(6 + 1). Thus, also, P(A) C P(N U Vy(B)); therefore 
P(A) € Vn(B + 2). 

(ix) Collection. For convenience, we approach collection via its equivalent 
form, replacement, that is, “For any set A and any function f, f[A] is 
a set.” We need, therefore, to show that for any set A € Vy(q@) and any 
function f € Vy(a) 


(Ay)y = flap 


or that f[A]€ Vy(a). Now, this is derivable by ZFC collection and 
VII.6.28, for f € Vy(a) implies f[A] C Vay (a), and 


by VII.6.29 
Card(f[A]) < Card(A) " < a 


(x) Axiom of infinity. We need an inductive set in Vy(q@). Since w € Vy(@), 
we are done. 


+ One can also do this with a sledgehammer: {a, b} C Vy(w) and Card({a, b}) < 2 < a. Hence 
{a, b} € Vy(a), by VI.6.28. 
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(xi) AC. Let S be a set of nonempty sets in Vy(a). We need a choice fun- 
ction in Vy(a). By AC, there is a choice function, f : S—> LS, in ZFC, 
such that f(x) € x for all x € S. Now by (viii), LJ S € Vy(@); hence 
S x US € Vy(a) (why?). Thus, f C S x US implies f € Vy(q@). 


VII.6.31 Theorem. It is consistent with ZFC that inaccessibles do not exist; 
that is, if ZFC is consistent, then so is ZFC + —(Aa)(a is (strongly) inaccessible). 
Proof. (Metamathematical) It suffices to show that if ZFC is consistent, then 
ZFC '¥ (da)(a@ is inaccessible) 
Suppose instead that 
ZFC | (da)(q@ is inaccessible) (1) 


Then we have a proof in ZFC that the smallest inaccessible, 6B, exists. Now 
introduce new constants # for that inaccessible, and N for a set of urelements 
such that Card(V) < 8.1 Thus, Vy(8) is a (formal) model of ZFC. By (1), 1.7.9, 
and VII.6.30, 


ZFC + ((Sa)(@ is inaccessible)” 
or 
ZFC + (da € Vy(B))(q is inaccessible)” (2) 


Since “a@ is inaccessible” is absolute for Vy (8) (see Exercise VII.71),* it follows 
from (2) that there is a real inaccessible in Vy(f), that is, (2) becomes 


ZFC + (daa € Vy(f))(@ is inaccessible) 


which contradicts the choice of 6 (see VI.6.9). Since ZFC is consistent, this 
contradiction establishes the original claim. 


VII.6.32 Remark. (1) The above can be transformed to a ZFC proof, via a 
formal model, that if ZFC is consistent, then so is ZFC + —(G4a)/ (a) (where we 
use “J(@)” here as an abbreviation of “a is strongly inaccessible”). Once we fix 
N with, say, Card(V) < @, the model is M = {x : (Va)U(a) > x € Vn(a))} 


1 The reader has had enough practice by now to see that augmenting ZFC thus — including the 
relevant axioms, e.g., “N is a set of atoms”, “Card(NV) < 6”, etc. — results in a conservative 
extension. 

} Intuitively, if « € Vy(Q), then an inhabitant of Vj (8) will perceive it as a strongly inaccessible 
iff an inhabitant of Uy does. 
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with interpretation of €, U as themselves. This is clearly so, for there are two 
cases: If there are no inaccessibles (i.e., -(4a)/(@)), then M = Uy is a model of 
ZFC + —(da)I(a); else M = Vy(B), where f is the smallest inaccessible, and 
hence again (by VII.6.31) M is a model of ZFC + —(da)/(q@) since (Aa) I (a) 
is false in M(I.7.4). 


(2) Can we, again in ZFC, prove consistency of ZFC + (da)/ (a) (assuming 
consistency of ZFC)? No, because this would clash with Gédel’s second incom- 
pleteness theorem, which says “In any extension S of ZFC, if S is recognizable 
and consistent, then S f CONS(S)’, where CONS(S) is a formula that says “S 
is consistent’. In outline, this goes like this. Assume that we have a proof 


ZFC | CONS(ZFC) — CONS(ZFC + (Aa)I(a)) (i) 
Hence we also have a proof in the extension 
ZFC + (dar)I (a) F CONS(ZFC) — CONS(ZFC + (Aa)I(a)) (ii) 
By VII.6.30, 
ZFC + (da) l(a) F CONS(ZFC) (iii) 


since, if B is any inaccessible and Card(V) < , then for the universal closure 
F of every axiom of ZFC we have 


ZFC + (Aa)I(a) + FY 


By (ii), (iii), and modus ponens we derive a contradiction to Gédel’s incom- 
pleteness theorem: 


ZFC + (dar) I(a@) F CONS(ZFC + (Aa)I(a~)) 


(3) How about weakly inaccessibles? Can we prove in ZFC (if this is consis- 
tent!) that weakly inaccessibles exist? Suppose we could. Then we could also 
prove this in the extension theory ZFC + GCH (which is also consistent — as 
Gédel has shown using his L). If 6 is the smallest weakly inaccessible as far as 
ZFC knows, then it is also the smallest strongly inaccessible in ZFC + GCH. 
But then Vj(f), constructed in ZFC + GCH with a well-chosen N, is a model 
of ZFC. We have, as in VII.6.31, 


ZFC + GCH F (da € Vy(B))I(@) 


a contradiction to the choice of . © 


1 By now this hedging must have become annoying. However, recall that if ZFC is inconsistent, 
then for any formula.% whatsoever, ZFC .¥ . 
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VII.7. Inductively Defined Sets Revisited; Relative Consistency of GCH 


We have had a first acquaintance, in VII.1.16, with 


(1) sets defined inductively as closures under certain operations, 
(2) induction on inductively defined sets, and 
(3) the relation of this concept to that of set operators. 


We now inform this discussion a bit further by our understanding of cardinals. 


First let us expand our understanding of operation, so that we can now allow 
infinitary operations. This will have as aresult, apart from the wider applicability 
of the concept (for example, the inductive definition of “computations in higher- 
type objects” involves infinitary operations) that the “operation” and “operator” 
approaches become equivalent (see the footnote to VIL.1.31). 


To this end, we will generalize operations, f, on a set S so that they have 
as argument list any set of members from S, rather than a finite sequence of 
such objects. Before we proceed to formalize, let us make sure that this makes 
intuitive sense, that indeed the new way of looking at rules subsumes VII.1.15 
and VII.1.16 as special cases. 

How do we indicate order of arguments if the arguments are just lumped 
into a finite set? Well, an easy way to do this is to have many “rules” X b> x 
for any given input X, so that we incorporate all desirable outputs x for all 
the relevant permutations of the set X (a permutation of X is, of course, a 
1-1 correspondence from X to X). For example, a rule Axy.x — y (for which 
order of arguments matters) would give rise to “rules” {x, y} t x — y and 
{x, y}h> y —x forall x, y. 


In general, an “old rule” (VII.1.15) on a set S, Ady. f (Gn), will give rise in 
the present section to new rules 


{@1,..., An} Sf (Gj, +++, ;,) 


for all permutations a; +> aj, of {a),..., dy}. In addition, we will allow rules 
X + x with X possibly infinite, thus going beyond simply translating VII.1.15 
into new notation. 

While we are at it, we find it elegant to allow rules J t» x. The right hand 
sides of such rules (x) will play the role of the initial objects of VII.1.15. This 
will unify the discussion, avoiding the annoying asymmetry between rules and 
initial objects. 


VII.7.1 Definition. A rule set R on a given set S is arelation R C P(S) x S. 
Instead of writing (X,x) € R or x RX, we will prefer the notation X +> x or 
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X14 x if R must be emphasized. A pair (X, x) or, in the preferred notation, 
X + x will be called a rule. 


A class X is R-closed iff whenever A C X then R(A) C X. 


A tule set R is finitary iff for all rules X +> x in R, X is finite. Otherwise it 
is infinitary. 


ce VIET Remark. A set X is R-closed iff R[P(X)] € X, since 


R[P(X)] = {a: GAG RAAAC X)} 


VII.7.3 Example. Every ordinal is closed under the solitary rule J +> 0. 
Every limit ordinal is closed under the rule set 


Or 0 
{fa}rratl 


It is clear that the rule sets are not single-valued relations in general. We will 
often omit mention of the set § such that R C P(S) x S. 


VII.7.4 Definition. Given a rule set R. We say that a set X is inductively, or 
recursively, defined by R iff X is the C-smallest set that is R-closed. 


Under these conditions, we also say that X is the closure of R, in symbols 
X =Cil(R). 


As in VII.1.19, we have 


VIL.7.5 Proposition. For any rule set R, C\(R) is uniquely defined by 


xX 
R[P(X)ICX 


Proof. Uniqueness. Say that S, T are both candidates for Cl(R). Then SC T 
and T CS by VII.7.4; hence S=T. 


Existence. To see that 
S=(\{X : X is R-closed} (1) 


satisfies VII.7.4, observe that if R is a rule set, then ran(R) is an R-closed 
set, and hence {X : X is R-closed} 4 @, so that S is a set. Trivially, the inter- 
section of any set of R-closed sets is R-closed, so that Sis. Now, if T € {X : X is 
R-closed}, then SCT. 
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VII.7.6 Corollary (Induction on the Structure of Cl(R), or R-Induction). 
Let .7 (x) be a formula, and R a rule set. To prove that (Vx € Cl(R)).7 (x), it 
suffices to prove that {x :.7 (x)} is R-closed. 


Proof. Suppose that {x :.7 (x)} is R-closed. Then sois C = {x :.7 (x)} Nran(R). 
But C is aset, hence Cl(R) € C C {x :.7(x)}. 


VII.7.7 Example. Let .7 be a set of initial objects, and .Y a set of (function) 
operations on a set S, as in VII.1.15. Form a rule set R as follows: 


Osa ifacZ (1) 
as R 
(Vf €F \{a1,..-, dn} flaj,,...,aj,) 
for all permutations a; +> aj,) (2) 
where, in (2), f(a;,,...,@j,) | is understood. 


We now see that Cl(.7,.% ) = Cl(R). 


Cc: By VIL1.20, we need to show that Cl(R) is .%-closed and that 
ZF CF Cl(R). Now, since Cl(R) is R-closed, these contentions follow from (2) 
and (1) respectively. [For example, ifa; € Cl(R) fori=1,...,n,andif f(a,) 
for f ¢.F, then f(a,) € Cl(R) by (2).] 

D: By VII.7.6, we need to show that Cl(7,.F ) is R-closed. So let 


CU, F) D {ay,..., an} Sa (3) 

By (2), the only way that (3) is possible is thata = f(aj;,,...,a;,) for some 
permutation of a1, ..., d,; hence a € Cl(7,.F ). We also need to settle the case 
C7, F) 2 Osa (4) 


By (1) above, such a are precisely those in .7; hence, again, a € CI(7,.F ). 


We next relate R-induction to induction with respect to a well-founded re- 
lation Q: A > A (A aset). 


VII.7.8 Example. Let Q : A —> A (A aset) have IC, and define R by 
Q(x) Sx forallxe A 
Thus, 


X CAis R-closed iff Q(x) C X implies x « X (1) 
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Two remarks: 


(i) By Q-induction (in the sense of VI.2.1), the right hand side of “iff” above 
says that A C X, and hence A = X. 

Gi) Thus, by (1), since Cl(R) C A (why is C1(R) € A?) and Cl(R) is R-closed, 
we get that A = CI(R). 


A side effect is that instead of Q-induction, to prove properties of A one 
can do R-induction. Indeed, the I.H. with respect to one relation is identical to 
that with respect to the other: For Q-induction we assume Q(x) C X (towards 
proving x € X). For R-induction we want to show that X is R-closed, but that, 
by (1), amounts again to assuming Q(x) C X (towards proving x € X). 


VII.7.9 Example (Some Pathologies). Let R be a rule set on a set A given by 
{a} t+ a. Then every set X is R-closed, so that @ = Cl(R). 


As another pathological case, let the rule set R on set B be such that whenever 
Paes x, X 4%. Now, Cl(R) exists anyhow, by VII.7.5. By VII.7.6 we can prove 
properties of Cl(R) by R-induction. This looks strange in view of VII.7.8, for 
one can start with any Q : B — B, then define R exactly as in VII.7.8, and lo 
and behold have an “induction tool” over B. Is the assumption that Q has MC 
(or IC) on B really important? 

Yes. If not, one could end up with an R as here described (due to the ab- 
sence of Q-minimal elements). Under the circumstances, @ is R-closed; hence 
Cl(R) = %, and we can do R-induction over 4, not over B. Hardly an exciting 
prospect. 


The moral is that: 


(1) We cannot bring in induction over the entire field of Q through the back 
door (via R of VII.7.8), if Q does not have IC on B. Our ability to do 
induction in these cases is restricted to some (often — as above — trivial) 
subset, Cl(R), of the field of Q (see Exercise VII.73 for what we can say 
in the general case). 

(2) To define “useful” closures, the rule set must have rules with empty premises 
(OG +> a). 


VII.7.10 Definition (Immediate Predecessors, Ambiguity). Let R be a rule 
set. For each a € ran(R) we define its immediate predecessors as follows: 


If (A, a) € R, then A is an immediate predecessor set (i.p.s.), while each 
member of A is an immediate predecessor (i.p.), of a. If 6 +> a is the only rule 
involving a, then a has no immediate predecessors. 
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The transitive closure of the i.p. relation is the predecessor relation. 


If for some ae€ran(R) there are A~B such that both (A,a) eR and 
(B,a) € R, then R is an ambiguous rule set; otherwise it is unambiguous. 


Thus, R is unambiguous iff, for alla € ran(R), R~! (a) is a singleton (its sole 
member is the unique i.p.s. of a), in short, R~! is a function. 


Aczel (1978) calls what we termed ambiguous rule sets “nondeterministic” 
(Hinman (1978) calls them “non-monomorphic’”’). We prefer the above termino- 
logy, as it is consistent with its usage towards characterizing a related pheno- 
menon in formal language theory. Similarly, the term “nondeterministic” is 
reserved in automata and language theory for rule sets that are not single- 
valued (as opposed to their inverses not being single-valued — which is what 
concerns us here). 


VII.7.11 Example. Ambiguity makes it hard (often impossible) to define func- 
tions on Cl(R) the natural way, i.e., by induction on the formation of C1(R). 
Here is an example. 

Let us define symbol sequences using the symbols 1, 2, 3, +, x by the rule 
set R given as follows: 


Grl 

Br> 2 

Or 3 
{x,y}rexty 
{x,y}r> y+x 
{x,y}rexxy 
{x,y}re yxx 


Cl(R) is the set of (strings denoting) non-parenthesized arithmetic expressions 
that utilize addition, multiplication, and the “constants” 1, 2, 3. 


Suppose we want to define the value, val(E) of any such expression, EF’. The 
natural way to do so is 


val(1) = 1 
val(2) = 2 
val(3) = 3 


val(x + y) = val(x) + val(y) if x, y are thei.p.sofx+y 
val(x x y) = val(x) x val(y) if x, y are thei.p.’s of x x y 


It should be clear that the above definition of val is, intuitively, “ambiguous” or 
“ill-defined” (terminology that was formally adopted in VIL7.10). For example, 
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there are two choices of i.p. sets for 1+ 2 x 3. One choice is x =1-+2 and 
y =3 (under x), so that 


val(1 +2 x 3) = val(1 + 2) x val(3) 
= (val(1) + val(2)) x val(3) 
SG 22)363.29 


while the other choice is x = 1 and y = 2 x 3 (under +), so that 


val(1 + 2 x 3) = val(1) + val(2 x 3) 
val(1) + (val(2) x val(3)) 
=14+(2x3)=7 


We get different results! Even 1 + 2+ 3 has two possible sets of i.p.’s, although 
this does not create a problem for val, since + is commutative. 


It is not always easy to prove that a rule set is unambiguous (it is much easier, 
in general, to spot an ambiguity). The reader will be asked in the Exercises sec- 
tion to check that a few familiar rule sets are unambiguous (Exercises VII.74 
to VII.77). Freedom from ambiguity is important in an inductive definition 
effected by a rule set R, for we can then “well-define”’, recursively, functions 
by induction over Cl(R). Examples of such functions are the val function over 
arithmetic expressions (assuming that arithmetic expressions are defined more 
carefully than in VII.7.11: brackets would have helped — see Exercise VII.75), 
the truth-value function on formulas (propositional calculus), assigning “mean- 
ing” (i.e., “interpretation” over some structure) to terms and formulas of a first 
order language, numerous definitions on “trees”, and more. The following result 
allows such recursive definitions. 


VII.7.12 Example. We continue on the theme of Example VII.7.8 by looking at 
the converse situation. Now we are given A = Cl(R), where R is unambiguous. 
We define Q: A > A by 


yQx iff yisani.p. of x with respect to R 
Thus, 
if X isthe (unique)i.p.s.ofx, then X = Q(x) (1) 
Does Q have IC? Well, suppose that S$ is a set for which we know that 


Olx) CS >xeS (2) 
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Can we conclude that A C S? Indeed we can, as follows: Let Y C S and Y 4 y. 
By (1), Q(y) = Y C S. By (2), y € S. Thus, S is R-closed, hence A C S. 


It follows that, for unambiguous R, R-induction can be replaced by i-p. 
induction (see however VII.7.9). © 


VII.7.13 Theorem (Recursion over a Closure). Let R be an unambiguous 
rule set ona set A, and g a total function on A x P(A x ran(g)). Then there is 
a unique total function f on C\(R) satisfying, for alla € Cl(R), 


f(a) = g(a, ft Xa) where X, is the unique set such that X q 4a 


Proof. Define Q on Cl(R) by 
xQy iff xeYHy 
Thus, for each y € Cl(R), Q(y) is the unique i.p.s. of y, or 
Y=Qly) iff yy 
and the recurrence in the statement of the theorem becomes 
f(a) = gla, f | O(a) 
Since Q has IC on Cl(R) (by Example VII.7.12) we are done, via VI.2.28. 


Several variations are possible for VII.7.13 (see Section VI.2), but we will 
not pursue them here. 


We return to operators P : P(X) > P(X) (Definition VII.1.25). It is now the 
case (compare with VII.1.31, footnote) that every monotone operator gives rise 
to an equivalent rule set. 


VII.7.14 Proposition. For every monotone operator T : P(X) —> P(X) there is 
arule set on X such that T = Cl(R). 


Proof. Define R foreach AC X, ae X: 
Atsa iff aeT(A) (1) 


Now, a set Z C X is R-closed just in case Z > At a impliesa € Z. This, in 
view of (1) and monotonicity, says that 


ZCXisR-closed iff F(Z)CZ 


As in VIL.1.27, 


That, conversely, a rule set R leads to a monotone operator I" such that 
Cl(R) = I is proved as in VII.1.31, and we will not revisit it here. We conclude 
the section with the introduction of the stages of construction of T, since, 
intuitively, what is happening is the iteration of a “construction” 


— that is, at each stage, we add to the S that we have so far all the new points 
we constructed in I'(S) (cf. the “abstraction” of this in VI.5.47). 


In the interest of greater flexibility in applying the operator concept, we relax 
the requirement that an operator I’ be necessarily a set (viz., its left field and 
right field may be a proper class X whose members are sets). 


VII.7.15 Definition (Stages). Let I be a monotone operator, that is, possibly, a 
proper class total function that carries sets to sets. We define by recursion over 


On: 


where 


for all a. 


We call the set % the ath stage (often, by abuse of terminology, we refer 
to the ordinal a itself as the ath stage). An element s € I® has level or stage 
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() Z = C\(R) 
ZoX 
Z is R-closed 


SG 
repeat until S converges to 
S<— SUT(S) 


TE =— | eas U rr<*) 


res Lai 


B<a 
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< a. It has level w if moreover s ¢ I? for all B < a. We write [ = J, T° 


or T° = LU, T*. We call T (P®) the class inductively defined by the opera- 


tor T. 


The notation ’* might be confusing at first sight. This is the w-th set constructed; 
it is not an operator. By the way, an easy induction on a shows that I"® is indeed 


a set (Exercise VII.78). 


© 
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The notation I <* for the union of all the stages before a is due to Moschovakis 
(it corresponds to the set S we used in the pseudo-program above). Note that 
P<° = Gand hence Fr’? = (9). © 


VII.7.16 Lemma. Let I be any monotone operator (not necessarily a set). If 
for some a, V<* = 1, then: 


da) r=r*= TP for B >a. 
(2) T is a fixed point of T, that is, (1) =T. 


Proof. (1): Assume (I.H.) that a < y < B implies [% = I'’. Thus, 


re ae 


yv<B 


=T<"U U rv 


asy<B 
—f man Cy Oa by LH. 
= p% by the choice of a 


Hence, 


ra=rf’urer) 
=— | ies U rr) 
—T?* 


In particular, T= U, re = T<¢ = PF, for the above a. 


@As a by-product, I’ is a set. © 


(2): Since T* = [<* UP(T'<*), it follows that [(<%) C T%; hence, 
Dyer (i) 
by T =l* =1<“, with a as above. Next, as an I.H., assume that 
re cr) 


Now, 
rer (U r’ 
y 
DT (U r) by monotonicity of I. (ii) 


Thus, by LH. and (ii), FP? = P<? U(r <4) CTP). 
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Therefore ©? ¢ I(P) for all B; hence T < [(P). This settles the issue, 
by (i). 


VII.7.17 Corollary. Let X be a set, and T: P(X) > P(X) be a monotone 
operator. Then the following are provable (cf. VI.5.47): 


(1) T isa set. 

(2) There is ana such that 1<% = T%. Moreover, T =T*% = V8 for B >a. 
(3) The a of (2) satisfies Card(a) < Card(X). 

(4) Tis a fixed point of T, that is, (0) = T. 


Proof. (1): This is trivial, since '* C X for all a, and hence T= Ur CX. 

(2): By the proof of VI.5.47 (see also the remark following that proof). 
Alternatively, the function f = As. min{a@ : s € I}, that is, the one that maps 
each s € I to its level, is a set by (1). Let a = sup* ran(f). Then 


Pe Urrer(Ur) 
B<a B<a 
= U re since every s € [('<*) is in some F’, B <a 
B<a 


= EF 


The rest follows from VII.7.16. 
(3): Since the function f : T > a of (2) is onto, andT C X, it follows that 

Card(a) < Card(T) < Card(X). 
(4): From VII.7.16. 


VII.7.18 Remark. (1) For an arbitrary monotone operator I’, the smallest 
such that P<* = TT? =T, if it exists, is called the ordinal of T and is often de- 
noted by |I'|. 


(2) We next relax the concept of rule set to allow also (proper) rule classes. 
However, we require some restriction on the size of the left hand sides of 
rules. 


VIL.7.19 Definition. An m-based rule class (possibly proper) IR, where m is 
regular, is a left narrow class of rules such that for every rule A + a of R, we 
have Card(A) < m. An w-based rule is called finitary. 


VII.7.20 Proposition. [fT is a monotone operator (possibly a proper class) 
defined from an m-based rule class R by Y(A) = {x : (AY C A)Y 4s x} for all 


© 
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A, then 
(1) |T}<™, 
(2) Cl) =F. 
or) is a set by left narrowness. 


Proof. (1): Since P™ D P<", it suffices to show that [™ C T'<™. Let then 
x € [I™. Thus, either x € P<", in which case there is nothing to prove, or 
x € T(r<™). Thus, for some A C I'<™, we have Card(A) < mand At x is 
an R-rule. By VII.6.10, A C '<* for some a < m; thus x e P(rs*) Cs™. 

(2): By VIL.7.16, 07) = PF = I™. Thus, I is an R-closed set; hence Cl(R) 
exists (i.e., is a set). Indeed, 


Cl(R) <T (i) 
On the other hand, assume '<* C Cl(R). It follows (CI(R) is R-closed) that 
hr=*) € Cl(R) 
and hence (by induction) 


lr’ ¢ Cl(R) for all a 


which promotes (i) to equality. 
We apply these ideas to prove the important reflection principle. 


VII.7.21 Lemma. For any formula .F (y, X,) and set N, there isa set M >N 
such that: 


(1) The following is provable: 
uy © M A+++ Atty, € M > (y)F(, iin) By € M)F(y, tin)) 


and 
(2) M can be chosen to satisfy Card(M) < max(Card(N), Xo). 


Proof. (1): Define the w-based rule class 
R= {iinasts tat ry :.F(y, tin) A y has least rank} UB > x:x EN} 


M = C\(R) is a set (by VII.7.20), satisfying N C M. 
Let us take u;,i=1,...,n,in M. The < -direction of (1) is trivial. Let then 


(Ay)F (y, u) (i) 
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Thus we may add .¥ (a, #), where a is a new constant and e(a) is minimum. 
Since M is R-closed, a € M; thus Ay € M).F(y, #) by substitution axiom. 

(2): Using AC, cut R down to T such that for each of the n! permutations w 
of uy}, ...,Un, where u; € M (the M above) we keep a unique {u1,...,Un}b y 
whenever .¥ (y, w) (this T is, of course, a set). Set M’= Cl(T). 

First of all, for all u; € M’, (Ay)F(y, uv) = (dy € M’).F(y, w), exactly as 
in (1). Next, by VII.7.20, 

M = U rr (ii) 
pew 

where I" is the monotone operator associated with the rule set T (recall, T is 
w-based, whence the choice of upper bound of (J in (ii)). 

By induction on p we argue that Card([’”) < max(Card(V), Xo). Indeed, 
this is true for p = 0, as P° = P(r <°) = P@) = N. Now, 


Card(T'?+!) = Card (U Pour (U r)) 


i<p i<p 


< Card (U r) +. Card ¢ (U r)) (iii) 
i<p i<p 


Card (U r) < (p+ 1)-. max(Card(V), 8o) = max(Card(V), &o) (iv) 


ix<p 


By the LH., 


Also, setting S = (is Tr’, 


Card(I'(S)) a Card({y : (Gu € S)({uy,...,Un}H> y is in T)}) 
< (Card(s))" since T is single-valued in y 
< max(Card(V), Xo) by (iv) (v) 


A 


By (iii)—(v) the induction is complete. Thus, by (7), 


Card(M’) < Xo -. max(Card(N), %o) = max(Card(N), Xo) 


VIL.7.22 Corollary. Lemma VII.7.21 holds if we have a finite number of for- 
mulas F ;, i=1,...,m.Thatis, for any set N, there isaset M D> N such that: 


(1) The following is provable for eachi =1,...,m: 
mE MA hu, EM > (Gy)F,(0, ih,) © Gy € MF (vt) 


and 
(2) M can be chosen to satisfy Card(M) < max(Card(), Xo). 


506 VII. Cardinality 


© Lemma VII.7.21 also holds for any set of formulas that can be indexed within the 
theory. However, for an arbitrary (infinite) set of formulas (not indexed within 
the theory) the lemma breaks down. It is still true metamathematically, though, 
since, arguing in the metatheory, we can index this (enumerable) set of formulas 
using N as index set. Of course, the R so obtained (by “put {w1,..., un} b> y 
in Ras long as ¥,(y, u) for some i € N, and y has least rank”) is still w-based. 


The proof technique in VII.7.21 (and the flavour of the result in VII.7.23) is 
analogous to that employed towards the downward Lowenheim-Skolem theorem 
of model theory (proved in volume | of these lectures). oe 


We next apply VII.7.22 to show that for any finite set of formulas, there is a 
set M such that each of these formulas is absolute for M. We say that M reflects 
these formulas. 


VII.7.23 Theorem (Reflection Principle). For any set N and any finite set of 
formulas F ,,i =1,...,m, there is a set M > N such that 
ZFC ¥,<F" fori=1,...,m 


assuming ux € M for all the free variables ur. 


Moreover, an M with cardinality at most max(Card(N), Xo) exists. 


Proof. Let F ip j =1,...,7r, be the list of all formulas that consists of the list 
F ;,i = 1,...,m, augmented by all subformulas of the F ;. If none of the -¥ ; is 
of the form (Ay), then take M = N. Otherwise, take M D N, using VII.7.22 
on all formulas of the form (Ay)@ in the G; -list. 


By induction on formulas we show next that 
ZFCL Y,0 YT forj=1,...,7r (1) 


from which the theorem follows. 


If ; 1S atomic, then (1) follows by VI.8.1 if any a € N, added as a constant 
to Let, relativizes as a. If Y j is —.4 or.@ V .#, then the I.H. guarantees that 


ZFC 4G <= 6™ 
and 

ZFCh Bo 2M 
from which, using VI.8.1 and the Leibniz rule, we get 


ZFC 3.46 (4.4)! 
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and 
ZFCt . 4V Bo(4V.B! 
Finally, let -7, be y)@. We get 


ZFC (Ay)O = (Aye M)O by VIL7.22 
o (dye MOM by LH. and Leibniz rule 
<= (Ay)0)" by V8.1 


(1) is proved, and we are done. 


© A consequence of the reflection principle is that ZFC cannot be finitely ax- 
iomatized if it is consistent. That is, there is no finite set of sentences .¥ ,, 
i =1,...,n, such that for every formula 0, 


ZFCL O iff F,,...%,+@. (1) 


This is so because a consistent ZFC using (1) can prove the existence of a 
(set) model for itself (i.e., one for {7 ,,...,.%,,}). Thus, ZFC can prove its 
consistency (cf. 1.7.8): 

ZFC k (AM\(-U(M) A FY A+ AF") 


n 


by VII.7.23 and tautological implication! with help from the Leibniz rule. This 
is contrary to Gédel’s second incompleteness theorem. 


As a by-product of this observation, we also conclude that an extension 
of VII.7.21 to an arbitrary set of formulas not only does not follow (in ZFC) 
from our (L6wenheim-Skolem) proof technique, but is downright impossible. 


Now, working in the metatheory, we can mimic the construction that builds 
the model U, for some set of urelements A. Continuing in the metatheory, we 
can apply reflection (to the enumerable — in the metatheory — set of axioms) and 
“cut U, down” to an enumerable (U, €)-model (M, U, €).} We can next apply 
Mostowski collapsing (Ax.C(M, x): M — C(M, M); see VI.2.38, p. 312) to 
get an €-isomorphic transitive set structure (C(M, M), U, €) which is also a 
model, since Ax.C(M, x) preserves atoms and the €-relation.‘ 


} ZFC, as given, indeed has infinitely many axioms. For example, collection provides one axiom for 
each formula. . On the other hand, if ZFC is inconsistent, then certainly all its theorems (which 
happen to be all formulas under these circumstances) follow from the single axiom (Vx)x 4 x. 

+ IfZFC + A <> A™ and ZFCE A, then ZFC A™. 

8 Take N in the proof of VII.7.23 to be enumerable. 

1 See VI.2.36-VI.2.39. Of course, M is extensional, being a ZFC (U, €)-model. 
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Thus, Platonistically, we have shown that a so-called countable transitive 
model (CTM) — (C(M, M), U, €) — for ZFC exists. 

This provides the source for the so-called “Skolem paradox” (not a real 
paradox): In ZFC we can prove the existence of sets of enormous cardinality. 
In particular, we can prove (Cantor’s theorem) that 


o & Pw) () 
However, it is “really true” that 
oo") ay P(@)o"™ (2) 


since both sets are enumerable. Isn’t this a contradiction? 

We know better by now. (2) is irrelevant, for it is not equivalent to (w ~ 
P(@))°“), due to the presence of an unbounded existential quantifier. As far 
as an inhabitant of C(M, M) is concerned, he sees that 


Ecc.) ® & P(a) 


and this is as it should be by (1). This person cannot see the 1-1 correspondence 
f that effects (2), for f is notin C(M, M). Note that the expression immediately 
above says the same thing as (cf. VI.8.4) 


C(M,M) 


Evy (  P@)) oS 


© Consistency of GCH with ZF. We conclude this chapter with a proof that Ly 
is a model of GCH. Which Ly? We add to ZF the new constants NV, f and the 
axiom 


AU(N) A (Vz € N)U(z) A f isa 1-1 function A dom(f) C w A ran(f) = N 
To avoid unnecessary linguistic (and notational) acrobatics we call this conser- 


vative extension of ZF just ZF. Then we build Ly in ZF as before. 


We also bypass tedious relativizations to Ly by working in ZF+ (VY = L) 
or a conservative extension thereof throughout (cf. VI.9.18). Thus, rather than 
proving -zp GCH™”, we prove GCH in ZF + (V = L) instead. The key lemma 
is the following: 


VII.7.24 Lemma. Jn ZF+(V = L) we can prove that if A is a transitive set 
and Card(A) < &q, then A C {Fg : B < Ne+i}- 


Once the above is settled, one can easily prove: 
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VII.7.25 Theorem. GCH is provable in ZF + (V = L). 


Proof. Let S C &q. Thus, A = Xq U {5S} is transitive, for x € A leads to two 
cases: x = S (and weare done by S C &q), orx € Nq (but Xq is transitive). More- 
over, Card(A) < &y +c 1 = &q. Thus, A C {Fg : B < §a41} by VIL7.24, from 
which 


Se {Fg: B < ati} 
Therefore 
P(Ra) S {Fp : B < Rati} 
Hence 
Card( P(8.)) < Card({ Fz :B< Rati}) < Reg 


— the last < due to the onto map %y+1 3 B > Fg. Since Xa41 < Card( P(Rw)), 
we are done. 


VIIL.7.26 Corollary. [f ZF is consistent, then so are ZF+GCH and 
ZFC + GCH. 


Proof of Lemma VII.7.24. Recall that, working in ZF+ (V = L), we get AC 
for free; therefore all our work on cardinals is available to us. The proof is an 
application of reflection followed by Mostowski collapsing. Freeze then sets A 
and m along with the assumptions 


A is transitive and Card(A) < m, m being an infinite cardinal* (1) 


It is convenient to work in a conservative extension { of ZF+ (V = L) fora 
while. 

Let us denote by L the language of ZF + (V = L) (this includes the constants 
N and f). © is obtained from ZF + (V = L) by adding to L a new constant B 
and the axioms 


{f, N}UNUTC(f)UA © BA=U(B) (2) 


Card(B) <m (3) 


1 Recall that “freezing”, jargon we have applied constantly towards invoking the deduction theorem, 
formally means to add new set constants, A and m, and the assumptions (1). 
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and the schema 
40 4° for every sentence .4 of L (4) 


where N? = N and f8 = f. Note that (2) and (3) are relatively consistent by 
(1) and the choice of N. 


To see that { is as claimed, let F< .4, where .4 is over L. Fix attention 
to one such proof, and let .7,,...,.%,, be the universal closures of all the 
axioms among (2)—(4) appearing in it. Thus, ZF + (V = L) — although, over 
an extended language L' that includes the constant B — proves 


Fins NF, > 4 (5) 


by the deduction theorem. Therefore (cf. 1.4.16), we have a proof in ZF+ 
(V =L) of. 71, A---A.F), > .4 — this formula being over L — where .7;} 
is obtained from .¥ , by replacing all occurrences of B in it by a new variable 
z (the same z is used in all the .¥ ,) that does not occur as either free or bound 
in (5). By 3-introduction, 

b2r+ wat) AF) A+ AF) > 4 (6) 
However, 

tzp+(V=L) (AzMF 4 A+++ AF) 

by VIIL.7.23 and Card({ f, N} UN UTC(f)U A) < m by N being count- 
able. VII.7.23 is applicable because only finitely many sentences (from schema 
(4)) are involved in .¥{A---A.¥|,. By (6) we now have a proof of .4 in 
ZF + (V = L) which establishes the conservative nature of the extension T. 

3 = (L’,&, B) is a formal (U, €)-model of ZF+ (V = L), where L’ is L 
with the addition of B. Indeed, if.¥ is the universal closure of aZF + (V = L) 
axiom, then .Y, since ¥ is an extension of ZF + (V = L). By (4), kz .F%. 

We prefer to have a transitive (set) model, so we invoke Mostowski collapsing 
(cf. VI.2.38). Since B (argot for 3) is a model of ZF + (V = L), it satisfies, in 
particular, extensionality. That is, it is extensional in the sense of VI.2.38. We 
have shown there that the unary function @ introduced in the theory T over L’ 
by the recursive definition? 


_ x if U(x) 
eG) = lees :y€BAyex} otherwise 
satisfies, for extensional B, 
tex € BAYEB> (xe yo G(x) € $0) (8) 
Fe pis 1-1 (9) 


¥ 6(x) = C(B, x) in the notation of VI.2.36. N and f are fixed points of ¢. (Why?) 
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and 
Lg ran(#) is transitive? (10) 
By the first case of (7) 
ber x € B > (U(x) + U)) dl) 


where %’ is the conservative extension of { obtained by adding the introducing 
axiom for @.‘ Its language is L”, that is, L’ with @ added. 

Now, 3’ = (L”, &’, B) is also a model of ZF + (V = L), and (8)-(11) yield 
a formal isomorphism (cf. p. 84) between 5’ and the transitive interpretation of 
L,3 = (L", &, ran(@)).5 In fact J is a formal (U, €)-model of ZF + (V = L). 
To see this we employ I.7.12 to obtain 


La, BP <> Grn) (12) 
for all sentences over L. This and (4) entail 
ke 6B <> GI) (13) 


for all L-sentences. If now .¥ is the universal closure of aZF + (V = L) axiom, 
then Fz .7; hence, by (13), we also have ky, .F™®), 
We note two more facts: 
Le, Card(ran(p)) "© Card(B) < m (14) 


Le {N}UNUA C ran(o) (15) 


the latter by Exercise VI.6, since {N}U N U A is transitive. By (15) and results 
in Sections VI.8 and VI.9, a + Fy is absolute for ran(@) (i.e., for J); hence so 
is ord, since it is introduced by the explicit definition! 


ord(x) = min{a : x = Fy} 
Then 
Fe, x € A > ord(x) € ran(¢) (using (15)) 
Hence, by transitivity of ran(@), 
Ke, x € A > ord(x) C ran(¢) 
+ This does not need the extensional nature of B. By the way, +z ran(p) = C(B, B) in the 
notation of VI.2.38. 
Extensions by definitions are conservative. Cf. 1.6 
Where U, €, N and f are interpreted as themselves, just as in 3’. cf. footnote to the definition 


of ¢. 
ord(x) = a <> (x = Fy A(VB € a)x F Fe), etc. 


wo te 


—_ 
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Further, bringing (14) in, 
Ke x € A > Card(ord(x)) < m (16) 


Let now x € A. Suppose also mt < ord(x). Thus (< is C), m* < Card(ord(x)), 
contradicting (16). Hence ord(x) < m*, and therefore x € { Fp: p< mt}. We 
have shown 


te AC {Fg:B <m"} 


Invoking the deduction theorem and remembering the assumptions (1) (with 
frozen variables) that we made at the beginning of the proof, we have a proof in 
’' of “if A is a transitive set, m an infinite cardinal, and Card(A) < m, then A C 
{Fg : 6 <m*}”. This is precisely the statement of the lemma, and we are done, 
for this argot can be trivially translated into a formula over L. The conservatism 
of £' means that we have proved the quoted statement in ZF + (V = L). 


VII.8. Exercises 


VIIL1. Show that w ~ w+ 2. 

VII.2. Show that if A ~ B, then A is finite iff B is finite. 

VII.3. Fill in the missing details in the proof of Proposition VIL.1.19. 

VII.4. Show that the concatenation of any finite number of (.7, .¥ )-derivations 
is a (Y,.F )-derivation. 

VII.5. Prove, using first Definition VII.1.32 and then Definition VII.1.3, that 


for any x and y such that x # y, both {x} and {x, y} are finite. Use the 
second method to also compute their cardinality. 


VIL.6. Fill in any missing details in the proof of Proposition VII.1.35. 


VII.7. Show, using induction on WR-finite sets, that if A is WR-finite and f 
is a function, then f[A] is WR-finite. 
VII.8. Show that every natural number is WR-finite. 
(Hint. Induction on WR-finite sets.) 
VII.9. Prove that if A is WR-finite and B C A, then B is WR-finite. (Do not 
use the equivalence of finite with WR-finite.) 
VII.10. Prove, by induction on finite sets, that if A is finite and < is a partial 
order on A, then A has both a <-minimal and a <-maximal element. 
VII.11. Using the previous problem, give an alternative proof that w is infinite. 


(Hint. Consider the partial order € on w.) 
VILA2. If |A| =n+1 anda e A, then |A — {a}|=n. 


ee 


VII.13. 


VII.14. 


VII.15. 


VII.16. 


VII.17. 
VII.18. 


VII.19. 


VII.20. 


VIL.21. 


VII.22. 
VII.23. 


VII.24. 


VIL.25. 
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If A and B are finite, then so are AM B, AUB, A— B,andA x B. 
(Hint. It is convenient to use induction on finite sets for the cases “U” 
and “x’”’.) 

If A is finite, then P(A) is finite. In fact, show that if |A| = n, then 
| P(A)| = 2”. 

(Hint. Use induction on n. Alternatively, count the subsets of A by 
counting their characteristic functions.') 

Show that if A is an infinite set and B is finite, then AU B, A x B, and 
A — B are also infinite, whereas A B is finite. 

Show that if A is enumerable and B is finite, then A U B and A — B 
are also enumerable. 

Show that a set is finite iff all its proper subsets are finite. 

Without using the notion of WR-finite, show that for any finite set A 
and any function f, f[A] is finite. 

Show that the set of finite subsets of w is enumerable. 

(Hint. Identify these sets with their characteristic functions.) 

Show that the function f, defined in the proof of Theorem VII.2.5, is 
strictly increasing. 

Give a proof of Corollary VII.2.6 without the help of V.3.9 — in parti- 
cular, without the help of the axiom of choice. 

Show thatif A is countable and f : A — B is onto, then B is countable. 
Show that the enumeration pictured in Example VII.2.17 (p. 448) is 
given by the function f—!, where 


(x + y(x+y+)) 
, 2 


f =Axy +y 


Also show that f is a 1-1 correspondence w? ~ w. 


Show that Axy.2*(2y + 1) — 1 provides a 1-1 correspondence w* ~ ow. 


(Hint. Relate this to the prime factorization theorem, and observe that 
all primes except 2 are odd.) 


Show that the function f : w? — w given by f(x, y)=(x + y+ y 
is 1-1 but not onto. 
(Hint. For “not onto” find an example. For 1-1, find a g such that 


go f = 1 (see Proposition V.3.4). Finding g : @ > w? amounts to 


TIFA C X and the set X is fixed throughout the discussion, then the characteristic function of A 
(with respect to X being understood) is the function x4 = Ax.if x € A then 0 else 1. 
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VII.26. 
VII.27. 


VII.28. 


VII.29. 


VII.30. 


VIL.31. 
VII.32. 


VII.33. 


VIL.34. 
VIL.35. 
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solving the equation z = (x + y)? + y for (unique, of course) x and y 
in w. Why?) 
Fill in the missing details in the argument of Example VII.2.20. 


An algebraic number is a real root of a polynomial with integer co- 
efficients. For example, /2 is algebraic (root of x? — 2 = 0). Each 
n € Zis algebraic. It is known that the number z is not algebraic. 
(Non-algebraic numbers are called transcendental.) Show that the set 
of all algebraic numbers is enumerable. 

(Hint. Use the fact that a polynomial of degree n can have at most n 
real roots.) 


Prove Theorem VII.2.25 only with the aid of Lemma VII.2.23. 

(Hint. If A is infinite, then it has an enumerable subset B, by Lemma 
VIL.2.23. Let b ¢ B. Then B — {b} is a proper enumerable subset of 
B, by Exercise VII.16.) 


Without the help of the axiom of choice, show that the set of irrational 
numbers, R — Q, is equipotent with R. 

(Hint. Find, in R — Q, a set of irrational numbers equinumerous 
with Q.) 

Prove Corollary VIL.3.3. 

(Hint. This follows from the technique of Example VII.3.1. Define d 
so that it is a member of “2.) 


Fill in the missing details in Example VII.3.5. 


Show that every number in [0, 1] with a finite binary expansion is 
rational. 


Show that every non-zero number in [0, 1] has an infinite binary 
expansion. 

(Hint. If it has a finite expansion, show that it also has an infinite 
expansion.) 

Show that for any reals a < b, (a, b) ~ (a, b] ~ [a, b) ~ [a, b] ~ R. 
Show that (0, 1] x (0, 1] ~ (0, 1], without the help of Cantor-Bernstein 
theorem, as follows: Identify each real in (0, 1] with its unique infinite 
decimal expansion. At first, experiment as follows: Given .aga a2... 
a; ... € (O, 1], form the pair (.apaz... do; ... 5 43... aj41 ...). For 
example, .140567... yields (.106...,.457...), and 


551010... yields (.511...,.500...). 
ee ee ee 


all 10’s all 1’s all 0’s 


VII.36. 
VII.37. 
VIL.38. 
VIL.39. 


VII.40. 


VII.41. 
VII.42. 
VII.43. 


VII.44. 
VII.45. 
VII.46. 
VII.47. 
VII.48. 
VII.49. 
VII.50. 
VII.51. 
VII.52. 
VII.53. 
VIL.54. 
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This last example shows that our tentative plan needs an amendment, 
for the second component of the pair is not a “real” as we understand 
them in this problem. Make an appropriate amendment to obtain a 
1-1 correspondence. 

(Hint. Revise the way you split .agaja2...a; ...into the blocks that 
you use to alternately build the two components of the pair, so that 
each block contains a non-zero digit.) 


Show, using the previous problem, that R? ~ R. 

Prove Proposition VII.4.11. 

Show that @ b> &, is onto Cn — w. 

Show that if a ~ a’ and b ~ b’, then a <b yields a' <b’, anda < b 
yields a’ < b’. 

(Tarski.) Show without the help of AC that a set x is infinite iff P(P(x)) 
contains an enumerable subset. 

(Hint. Consider the function f : w@ — P(P(x)) given by f(n) = 
{a € P(x): aw~n}.) 

For any a, 6, show that Card(a@ + 8) = Card(a) +, Card(B). 

Show that a < 6 does not, in general, imply a+,.¢ < 6+. c. 
Without using VII.5.14, prove that a+, c = c for all a < c, where 
c = Card(R). 

Prove VII.5.12 in a different way: Use induction on n. 

For any a, 6, show that Card(a - 6) = Card(a) -. Card(). 

Show that for all a, 8B, ®y +e 86 = Na vc 8p = Ravgz- 

Show that for all a > 0, 0° = 0. 

Show that (a -. 6) = a‘ -. 6° for all a, b, c. 

Fill in all the missing details in the proof of VII.5.21. 

Compute n*° for all n € a. 

Compute c®, 

Show that c < c°. 

Compute (c°)° in terms of f = Card(®R). 

Compute the cardinality of the set of all continuous real-valued func- 


tions on R. 


(Hint. A continuous function is uniquely determined by its restriction 
on Q, the set of rational numbers.) 
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VIIL55. 


VII.56. 


VII.57. 
VIL.58. 
VIL.59. 


VII.60. 
VII.61. 
VII.62. 


VII.63. 


VII.64. 
VII.65. 
VII.66. 
VII.67. 


VII.68. 
VII.69. 
VII.70. 
VII.71. 


VII.72. 
VII.73. 
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Compute the cardinality of the set of all differentiable real-valued func- 
tions on R. 


x . 
Show that Xo’ = 2* on the assumption that w < B. 


(Hint. Use VII.5.21 and VII.4.25.) 

Show that if € < [, then a® < a! 

Prove that for any ordinal a, cf(Ne+o) = @. 

Prove that if A; ~ B; for alli € J, andif Aj 1 Aj = 6 = BN B; 
whenever i # j, then U;.; Ai ~ Use, Bi- 

Prove that Card((U;-; Ai) < }0;<; Card(Ai). 

Prove that a; < 6; for alli € J implies )°;-, a) < )0,<, bi. 

Prove that Sen €; = mM-, SUP; cm €;, if £; > O for all i, and at least one 
cardinal among m and &; is infinite. 

Prove that an infinite cardinal a is singular iff there are cardinals b < a 
and m, < a, for all A < 6, such that a = ae m,. 

If A; ~ B; fori € J, then [];-, Ai ~ [je Bi- 

Show that cf(2*«) > &, for any a. 

(Bernstein.) Prove that 88« = 2*« ... 8, for all n € @ and all a. 
Define the beth function, 1, by the induction Ip = Xo, yy; = 27 
and, if Lim(a), 2, = U a J. Show that has fixpoints. 

Show that if GCH holds, then 2, = X, for all a. 

Let Card(N) < w. Show that Card(Vy(@ + @)) = Dy for all a. 

Show that if Lim(q@), then Vy (q@) is a model of ZFC less collection. 
Let Card(N) < a, where a is strongly inaccessible. Then the following 
are absolute for Vy (a): 

(1) 6 is acardinal, 

(2) f : 6 > y is cofinal, 

(3) cf(), 

(4) 6 is strongly inaccessible. 


Prove that “8 is a cardinal” is not absolute for Vj (w + 2). 


Let R be a relation on a set A. Define Wf(R), the well-founded part 
of R, as the set {a € A : there is no infinite chain... R ayR a, R a} 
(where, of course, all the a; are in A, since R C A x A). Prove that 
Cl(R) = Wf(R), where R on P(A) x A is given by 


xidx iff X = R(x) 


VIL.74. 


VIIL.75. 


VII.76. 


VII.77. 


VII.78. 


VIL79. 
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(Hint. For C use induction on the structure of Cl(R). For D recall 
VI.2.13 and use MC over Wf (R).) 


Prove that the rule set that defines the syntax of the formulas of propo- 
sitional calculus is unambiguous. This rule set, P, is 


Bt> p for each variable p 
{x,y} @V y) 
{x, yp > (VV x) 
{x} > (ox) 


(Hint. Show by induction over Cl(P) that (1) every member of Cl(P) 
has as many “(’-symbols as “)’’-symbols, (2) every nonempty proper 
(string) prefix of a member of Cl(P) has strictly more “(’’-symbols 
than “)’-symbols. The rest should be easy.) 


Prove that the rule set that defines the syntax of the fully parenthe- 
sized arithmetic expressions (on the symbol set {1, 2, 3, x, +, (, )}) is 
unambiguous. This rule set, P, is 


QOtr> p for p = 1,2,3 
{x,y} (t+ y) 
{x,y} (y+ x) 

{x, y} > (x x y) 
{x, y} > (y x x) 


(Hint. As in the preceding exercise.) 


Imitate Exercise VII.75 to define a rule set that defines all the terms of 
a first order language, and prove that your rule set is unambiguous. 
Imitate Exercise VII.75 to define a rule set that defines all the formulas 
of set theory, and prove that this rule set is unambiguous 

(Hint. Brackets, once again, are important.) 

Prove that for any total operator I’ (even one that is a proper class), % 
is a set for all a. 

Prove VII.6.31 differently: Invoke Gédel’s second incompleteness 
theorem. 


Vill 


Forcing 


The method of forcing was invented by Cohen (1963) towards the construction 
of non-standard models of ZFC, so that “new axioms” could be proved consis- 
tent with the standard ones. Our retelling of the basics of forcing found in this 
chapter is indebted primarily to the user-friendly account found in Shoenfield 
(1971). The influence of the expositions in Burgess (1978), Jech (1978b), and 
Kunen (1980) should also be evident. 

In outline, the method goes like this: Suppose we want to show that ZFC 
(sometimes ZF or an even weaker subtheory) is consistent with some weird 
new axiom, “NA”. Working in the metatheory, one starts with a CTM, M,/ 
for ZFC. This is the ground model. One then judiciously chooses a PO set,! 
(P, <, 1), in M — where we find it convenient to restrict attention to PO sets 
that have a maximum element (let us call the latter “1”) — and, using the PO 
set, one constructs a so-called generic set G. Circumstances normally have G 
obey G ¢ M. The “judicious” aspect of the choice of the PO set will entail that 
the generic extension, M[G], of the CTM M not only contains G as an element 
but is a CTM itself that satisfies NA as well (i.e., Eajg] ZFC + NA). Thus, one 
has a proof in the metatheory that if ZFC is consistent (i.e., if a CTM for ZFC 
exists), then so is ZFC + NA. 

We have said above that “(P, <, 1) <M”. By absoluteness of pair (see 
Section VI.8), the quoted statement is equivalent to “P ¢ M and<e M and 
lem”. 


+ We know that we have used the symbol M for {x : U(x)}. However, it is normal practice for 
people to also denote by M an arbitrary CTM of ZFC. We are rapidly running out of symbols; 
therefore we ask the reader to allow us this overloading of the letter M@ with more meanings than 
one. As always, we will invoke context in our defense. 

= This is the hard part of the method. 
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© The above argument cannot be formalized in ZFC “as is” to provide a finitary 
proof of relative consistency, namely, along the lines “if ZFC is consistent, here 
is how we can construct a set model for ZFC + NA in ZFC”. Unfortunately 
such a construction would formally prove as a corollary (in ZFC) that ZFC is 
itself consistent, clashing with Gédel’s second incompleteness theorem. 


Pause. In ZF we have shown how to construct a model, L, of ZFC + GCH. 
Why does this not contradict Gédel’s second incompleteness theorem? 


We can still circumvent this difficulty and provide finitary proofs of relative 
consistency results of the type “if ZFC is consistent, then ZFC + NA is too”, 
using forcing. One attempts the contrapositive instead: 


If tzpcrna 040, then tzpc 040 (1) 
But the if part means that for some finite set of axioms of ZFC, I’, 
TU{NA}F 040 (2) 


Now, we can construct a CTM, M, just for 1, inside ZFC using reflection 
(cf. VII.7.23 and the proof of VII.7.24) followed by Mostowski collapsing. 
Using forcing, and continuing to work formally inside ZFC, we get a generic 
extension of M, M[G], that is a formal model of T U {NA}. Now this is fine 
by Gédel’s second theorem, for I" is not the entire ZFC axiom set. By (2), we 
have shown F zgc (0 4 0)™!1, and hence z¢c 0 ¥ 0, since 0 is absolute for 
transitive classes. This concludes the forcing proof of (1) in a finitary manner. Oe 


We will do our forcing arguments in the metatheory. 


A Note on Proofs. We will be working in the metatheory throughout most of 
this chapter, using the ZF axioms as our hypotheses (sometimes adding AC). 
We will do so usually from within U, (for a usually unspecified set of start-up 
atoms, A), but often from within some CTM M, that is, relativizing formulas 
and arguments to M. 

Our proof terminology will be similar to that of the “practising mathemati- 
cian”. We will say “true” or “false” for sentences (whereas before we have 
always said “provable” or “refutable’”’) and, moreover, we will usually refrain 
from reminding the reader that this or that principle of logic (e.g., proof by 
auxiliary constant, deduction theorem, proof by cases, etc.) is at work. © 
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VIII.1. PO Sets, Filters, and Generic Sets 


In this chapter we will fix attention on special types of PO sets that have a 
maximum element (which is, of course, unique), invariably denoted by “1”. 


VIII.1.1 Definition. Let (P, <, 1) be a PO set with a maximum element — 
which we will denote by “1”. Throughout this chapter we will call the members 
of P forcing conditions or just conditions, and such PO sets notions of forcing. 
We will use the letters p,g,r,s, with or without primes or subscripts, for 
conditions. 

Ifp =qV p <q, we write p < q and say that p extends, or is an extension 
of, q.' When p < q, then p is a proper extension of q. Two conditions p and 
q are compatible iff there is an r € P such that r < p and r <q. If two con- 
ditions p and g are not compatible, then they are incompatible and we write 
ptq. 

The abbreviation 
qVq<p’. 

A set O C P is open iff, for any p € O, < (p) © O. In particular, every 
segment < (p) is open. 

A set D C P is dense iff it meets every open set. In other words, for every 
p € P there is ag € D such that q < p (1.e., DN < (p) #9). 

A chain over the PO set! P is a set C C P such that any two elements of C 
are comparable. 

An antichain over the PO set P is a set A C P such that every two of its 
members are incompatible. 


“p and q are comparable” stands for “p=qvV p< 


VII.1.2 Remark. In structure parlance (cf. Section I.5) a PO set (P, <, 1) is 
a structure, with underlying set (or domain) P and with < and | as specified 
relation and function (a 0-ary function or constant) respectively. Thus, if need 
arises, we will use fraktur type (and the same letter as the domain) to name the 
structure; in the present case, 3 = (P, <, 1). 

The terms “‘open” and “dense” are not accidental. There is a strong relation 
between the homonymous topological concepts and forcing, but this connection 
with topology will not be pursued here. By the way, the fully qualified terms 
are $B-open and ‘B-dense respectively, but usually the qualification is omitted 
and SB is understood from the context. 


1 Yes, the extension is the “smaller” of the two. This terminology is due to Cohen (1963). 
t Recall that when the order < is understood, we say “PO set P” instead of “PO set (P, <, 1)”. 
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VIII.1.3 Example. The set of finite functions from w to {0, 1}, i.e., 
{f : f isa function and dom(f) € w A ran(f) € 2} (1) 


can be given a PO set structure as follows: Define f < g to mean g C f (we 
order by “reverse inclusion’). Here @ : @ — 2 is 1, the maximum element. For 
the sake of future reference, let us give this PO set the name D = (O, <, 1) 
(or D = (O, D,%)), where we have used the name “O” for the set displayed 
in (1). We also note that D has no minimal elements. Such an element would 
be a finite function that had no finite proper extension. 


VIII.1.4 Definition. Let (P, <, 1) be a PO set as in VIII.1.1. 

A set F C Pisa filter over (P, <, 1) — also called a (P, <, 1)-filter, or a 
P-filter if < is understood, or just a filter if P is also understood — provided the 
following conditions are fulfilled: 


GQ) 1leF. 

(2) For any two members p and g of F there is anr in F such that r < p and 
r<q. 

(3) If p € F and p <q, theng € F (or, pe F > =(p) C F). 


In view of (3) in VIII.1.4, (1) is equivalent to the requirement “F 4 @”. 
Note that in (2) we have asked more than compatibility of any two members 
of F: We want this compatibility to be “witnessed” inside F’. © 


© In algebra people define their filters in a stronger manner. First off, one requires 
that the PO set $8 be a /attice, that is, for any two of its members p and q, both 
sup{p, qg} and inf{p, q} exist.i One then calls an F C P a filter if it satisfies 


(i) F 4 @ (same as (1) in VIII.1.4 if the lattice has a maximum element), 
(ii) for any two members p and q of P, inf{p, q} € F iff {p,q} C F. 


Note that if F is a filter in the sense (i)—(ii) over the lattice B = (P, <, 1), 
then it is as well in the sense (1)-(3) of VIII.1.4 over the PO set 58. Indeed, 
by (11), if p and g are in F, then so is inf{ p, g}, providing the “witness” that (2) 
requires. Also, if p € F and p < q (q € P), then p = inf{p, q}; hence (by (ii)) 


qc F. o 


T sup and inf were defined in VI.5.23. 
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VIIIL.1.5 Example. Fix a PO set 8. The set {1} is a filter. 

If@ # S C P isachain, then we can build the C-smallest filter that contains 
S. We call it the filter generated by S. 

To see that this exists, define 


F 2 (pe P: (ag € S)q <p} (*) 
Trivially (by x < x), S C F. We next verify that F is a filter. For property (1) 
(of VII.1.4), pick any g € S (S is not empty). But then g < 1; hence 1 € F 
by («). Property (3) is also trivially verified. As for (2), let p and p’ be in F. 
Then gq < pandq’ < p’ for some q and q’ in S. For the sake of concreteness, 
say q <q’ (by comparability of S-elements). Then g is an appropriate witness 
for the compatibility of p and p’. 

Let F’ be any filter such that 


SC F' (ok) 


Let p € F and also q (in S, by («)) such that g < p. Since g € F’ by (x) and 
F’ is a filter, it follows that p € F’. Thus, F C F’. 


VIII.1.6 Example. Refer to the PO set O of VIII.1.3. Suppose that F is a filter 
over 9. Then |) F is single-valued (by VIII.1.4(2)), i-e., a function w > 2. 


VIII.1.7 Definition (Generic Sets). Given a PO set % = (P, <, 1) and a set 
M. A subset G C P is called M-generic iff 


(1) Gis a filter over $B, 
(2) G meets every P-dense set D that is a member of M (that is GN D # 8). 


The reader is reminded that the phrase “D is P-dense” subsumes the sub- 
phrase “D C P”; see VIII.1.1 and VIII.1.2. 


VIII.1.8 Theorem (Generic Existence Theorem). Let 8 = (P,<,1) bea 
notion of forcing, and M be countable. Fix any p € P. Then there is an M- 
generic set G C P such that p € G. 


Proof. Letmo, m1, m3, ... bea fixed enumeration of M. We define by recursion 
a function f on w by 


fOQ)=p 


© 
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and 


fn ne 1) = era such that m, € < (f(n)) Am, _ = (f(n)) Amy # i) (x) 
fin) if < (f(a) Om, =9 
Note that the explicit definition of the subscript “k’” above avoids an infinite 
set of “unspecified choices” (AC). Also note that the last case above always 
obtains if m, is an urelement. 

It is easy to see that ran(f) is a nonempty (p € ran(f)) chain: First off, that 
ran(f) C P is trivial. Next, the reader can verify that (Wn € w) f(n+1) < ftv) 
holds (induction on n; the last case in the definition of f guarantees that f is 
total on w). 

Taking for G the filter generated by the chain ran(f) (see VIII.1.5) will 
do. Indeed, that p € G is trivial. Let then D € M be dense. Now D = m, 
for some n. Then the first condition in («) gives us f(m + 1) = mx, where 
my EX (f(n)) ND. Since my € G, we have GN D F 9G. 


VIIL1.9 Example. Let MW be a CTM for, say, ZF. We will consider the PO 
set of VIII.1.3 relativized in M. By absoluteness of pairing and finiteness (see 
Section VI.8), {a, b}, ordered pairs, and finite functions are (M-) absolute, and 
so is P,,(A) defined as {x : x C A A x is finite} (see Exercise VIII.3). We also 
recall that finite ordinals (and w), dom, and ran are absolute for M. 

Thus one may redo Example VIII.1.3, this time arguing from within M, as 
an inhabitant' of M would do, to obtain in M the PO set $B = (P, D, 0), where 


P={p: pisafunction A dom(p) € w A ran(p) © 2} (1) 


He will conduct his argument by noting that w and 2 are in M, and therefore so 
is wm x 2; thus P € M, by separation, sincet P={p € P,(w x 2): p is a func- 
tion A dom(p) € @} (he knows that M is a ZF model, so he can do all that). It 
then follows that $8 is in M as well, by the fact that M is closed under pairing. 


Pause. Why doesn’t he just say that P is a set by separation, since P C M? 
Equivalently, a being of U4 argues the same thing by making the case that 
P™ given in 


P™” = {p € P,( x 2): pisa function A dom(p) € w} (1’) 


+ This person uses just “{p : etc.}” rather than “{p € M : etc.}”, since the € M part is implicit; 
there are no universes beyond M for him. 

¥ For him P,,(A) consists precisely of these finite subsets of A that are also members of M — which 
are all the finite subsets of A, absolutely speaking. 
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is in M, since separation holds in M and P,,(@ x 2) € M (why?). Here we are 
invoking VI.8.4 and VI.8.13, noting that, for p € M, absoluteness makes the 
presence of a superscript “”” redundant inside the braces in (1’) above. 


In particular, all this shows that P (hence also $8) is absolute for M. 


Let G be M-generic. Such a G can be constructed by VIII.1.8, since M is 
countable. Let us continue working in Uy to construct G. 

Since G is a filter, ) G is a function (VIII.1.6). Note that, for any n € a, 
the set D, = {p € P: n € dom(p)} is P-dense in M; for if g € P, then either 
q(n) {| (in which case g € D, — indeed, any extension p < q is in D,)i or 
q(n) +. Well, then, define p = g U {(n, 0)} (this is in M, by absoluteness of U 
and pairing). We have p > q and p € D,. Thus G meets all the D,, in other 
words, n is in the domain of J G for all n: dom (LU F) = @. 

Is Ge M? Suppose yes, and consider the set (in M by absoluteness of 
difference) P —G. This is P-dense in M: Let pe P. Let q and r be two 
incompatible extensions of p in M. For example, say n is smallest such that 
p(n) +. Set then g = p U {(n, 0)} andr = p U {(n, 1)}. Now q and r cannot 
both be in G for gLr. Say q ¢ G. But then g € P — G. Having established 
the density of P — G (in M), genericity would now imply (P — G) NG £ 9G, 
a contradiction. 

We have said above “Let us continue working in U, to construct G”. We 
see that such caution was justified, for an inhabitant of M cannot construct 
this G. 


VIII.2. Constructing Generic Extensions 


Our purpose is to define a procedure which for any CTM M and M-generic set 
G builds a CTM, M[G], that is the C-smallest extension of M containing G 
as a member. Such an extension is called a generic or Cohen extension of the 
ground model M. 

Moreover, we want to be able to empower inhabitants of M to discuss aspects 
of M[G] notwithstanding the fact that a lot of objects in M[G] are not in M — 
for example, G under “practical circumstances” is not; see VIII.1.9. 


To keep our sanity, we will usually employ the language and methods of ZF 
(or ZFC, or even of a fragment of ZF) “in the abstract” (i.e., formally) to effect 
our various constructions. We can afterwards relativize what we have done 


7 Recall that this $3 has no minimal elements. 

One can view this approach alternatively: We are really working, metamathematically, within the 
“real” universe, U4, however restricting our methodology to only employ ZF axioms and rules 
of logic. Of course, any confirmed Platonist who has followed us this far will say: “That’s an odd 
comment; wasn’t it exactly this approach that we took all along?” 
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to some CTM M, using results from Sections VI.8 and, sometimes, VI.9. On 
occasion, it might be just as easy to work as an inhabitant of M would and, 
using the methods of ZFC (or ZF, or ...), argue in effect from within M, as we 
have done in the initial part of VIIT.1.9. 


Our discussion will be, in general, dependent upon a PO set “variable”, which 
we will invariably call by the nondescript name $B = (P, <, 1). For convenience 
we will use the (fairly standard) notation 


|$8| stands for P (1) 


VIII.2.1 Definition. Let M be aset and ‘8 € M.Forany G C |P| we introduce 
the following abbreviation: 


aégb_ standsfor (4p € G)(a, p) eb (1) 


VITI.2.2 Remark. We have fixed an M and 8 € M. The relation x P y defined 
by (4G C |§B])x Eg y has MC, as this follows from x P y > p(x) < p(y), this 
latter since x € (x, p) € y for some p € G, for some G C |B |" (cf. VI.6.24). It is 
also left-narrow: x Py > x € TC(y). Thus we can effect recursive definitions 
with respect to P, and in particular with respect to Eg (fixed G), as well as do 
P-induction and €g-induction (fixed G). Cf. VI.8.23 and VI.8.24. 


VIII.2.3 Definition. Let M be a set, B € M, and G C |B]. Working in U,,! 
we define the interpretation function, 4xG.x°, by P-recursion: 


G_ |x if U(x) 
i one x} if -U(@) (1) 


We will call a° the G-interpretation of a. 


More completely, we should add to the definition (1) above, second case, the 
conjunct “AG C P A (P,<, 1) € M isa PO set”.s One then adds a third, 
“otherwise” case where, say, x° = J. This “completion” spoils the clean form 
of (1) and adds or subtracts nothing to or from the expected properties of x 


¥ Orx € {x} € (x, P) € y if one uses the Kuratowski “(...,... 3 

} In ZF in fact, “abstractly” or formally, as we do not use an assumption that M is a CTM. 

8 We mean that (P, <, 1) is “really” a PO set, as we carry the definition out in U,. Anyhow, if M 
is a CTM, absoluteness of being a PO set would make (P, <, 1) a PO set in the eyes of people 
living in M as well. 
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used in the sequel. Thus we have stated the missing conditions loosely in the 
“assumptions” instead. 

Note that if we fix G, then the above defines Ax.x° by €g-recursion using 
G as a “parameter”. But whence “interpretation”? This terminology will make 
sense in the next section. © 


Finally: 


VIII.2.4 Definition. Let M be a CTM of ZF, $8 € M, and G an M-generic set. 
We define the set M[G] by 
M[G] = {x° :x eM} 


We call M[G] a generic extension or Cohen extension of the ground 
model M. 


VIII.2.5 Remark. By results and techniques of Section VI.8, AxG.x© is ab- 
solute for transitive models of ZF (see Exercise VIII.4). Thus if MC N and N 
is a CTM (of ZF) that satisfies G € N, then 


N > (a°)" =a° 


for any a € N, in particular for any a € M. Thus, M[G] = {x°: xe MJ CN. 


We next build tools to show that for any CTM M and M-generic G, we have 
M C M[G]andG € M[G]. 


VIII.2.6 Definition (The Caret). We define (in ZF formally or in U, meta- 
mathematically) by €-recursion a function Ax B.%: 


a _ JX if U(x) 
aos {(9, 1): y ©x} otherwise 


where $B has | as the maximum element. 


VIII.2.7 Remark. Again, there ought to be a third, “otherwise” case above, 
yielding, say, f = 4, whenever the “input” 5 is not as it should be. The present 
“otherwise” would then become the case “=U (x) A ($B is as it should be)”. 

Work in Section VI.8 easily yields that the function AxB8.% is absolute 
(cf. Exercise VIII.5). Thus if M is a CTM of ZF and % € M, then x € M im- 
plies that § = (%)” e M. 
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In particular, ((Ax9B.2) [M ) € M. This latter fact can also be seen as follows: 
A mathematician who lives in M can effect the recursive definition VIII.2.6 
in M. 


VIII.2.8 Lemma. Let M be a CTM of ZF, and G an M-generic set with respect 
to 8B € M. Then M C M[G] andG € M[G]. 


Proof. Let x € M. We do €-induction to prove (t)° = x; thus M C M[G], 
since & € M by VIII.2.7 and therefore (%)° € M[G] (cf. VIII.2.4). 

If U(x), then (%)° = x% = x. 

Let now =U (x). Then 


(2)° = {y? : y Eg 8} by VIII.2.3 
= {y% : (Ap € Gy, p) € 8} 
= [y° :(y, 1) € {2 1) :z€ x} sincere) 
= {(2)% :z ex} 
={z:zex} by LH. 


=X 


To prove G € M[G], we look for an element ! € M such that 1° = G. We 
will calculate that 


T = {(p, p): p< P} () 


will do fine. By closure of M under pairs and by the fact that collection is true 
in M,T € M. We next calculate 


r? = {y° : (ap € G)(y, p) €T} 
= {(p)° : p € G} 
={p:peG} by what we have proved above 
=G 


VIII.2.9 Remark. M and M[G] have the same urelements. Indeed, MC M[G] 
yields that all atoms of M are included in M[G]. Conversely, suppose that U(a®) 
is true in M[G], and hence in Uy (recall from Section VI.8 that we set U™ = U 
in general in (U, €)-interpretations). From VIII.2.3, a& = a (otherwise a 
set). Thus, a& € M (since a € M). 


isa 


VIII.2.10 Example. M[G] is closed under pairs, that is ze M[G]A 
w € M[G]— {z, w}e M[G]. 


© 
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It suffices to find au € M such that uw = {z, w}. We start by letting a® = z 
and b¢ = w, witha and b in M (VIII.2.4). Since M isa CTM of, say, ZF, its clo- 
sure under pairing yields {(a, 1), (b, 1)} eM. Now u= {(a, 1), (b, 1)} is what 
we want, for, by VII.2.3, 


ul = {a®, b°} 


since a €g u and b &g u. 


VIII.2.11 Remark. So far we readily have the “C” and “T” of the expected 
CTMattributes of M[G]. Indeed, by VIII.2.4, the functionAx.x° : M > M[G] 
is onto. Since the left field is countable, this settles the “C’’. As for transitivity, 
let x € y° € M[G]. Then y® isa set, thus (VIII.2.3) x = z° for some z €g y. 
Therefore, for some p € |B|, z € (z, p) € y € M; hence z € M by transitivity 
of M. Finally, x € M[G] by VIII.2.4. 


For the “M’’, that M[G] is a model of, say, ZFC if M is, we need more work. 


VIIIL.3. Weak Forcing 


Let us fix a CTM M (of ZF, or ZFC, or of some extension, or of a suffi- 
ciently strong fragment such as ZF without the power set axiom — the so-called 
“ZF— P’’), as well as a PO set $8 € M. We will be working in the metatheory, 
within U4, using those axioms of ZFC for which M is a model. Suppose that 
we have built M[G] for some M-generic G C |B], as in VII.2.4. 

We want to allow inhabitants of M to reason about this M[G]. For this 
purpose they need names for the “real objects” of M[G] so that they can write 
down formulas that can refer to specific objects of M[G], such as G. 

Now, by VIII.2.4, any object a € M[G] has the form b© for some b € M. 
This b, or, formally, a name for b, will name a. 

Thus, we import into the basic language of ZFC, Ls, anew constant symbol 
name for each member of M.' This is a process familiar to us from I.5.4. 
However, rather than only using the (argot) names i, j, k for members of M — 
andi, j,k for their formal counterparts in Lye, — we will continue using any 
(argot) name we please for members of M (such as a, b, q, p,i,...) and, as 
in 1.5.4 reserve the argot names G, b, p,i, ...to stand for names of imported 
constants. Thus, ad names a, etc. Let, is called the forcing language. 

We will now view M[G] as the domain of the structure tg = (M[G], U, €), 
and we interpret the language Let, 1n SJtg in the standard manner (Section I.5). 


¥ As we did with Lse, we are free afterwards to extend the augmented language by definitions. 
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We moreover let 
(a)"6 = a® for each a € M (1) 


Thus every formula over Lget,y says something about M[G].' 


Caution. It is important to note that even though the names @ were “built” by 
importing into Ls names of objects of M, they are primarily used to name — 
via the interpretation (1) — objects of M[G], not objects of M. 
Whenever we interpret a formula in the structure IN = (M, U, &), the names 
a, ... are interpreted as (a)"™" =a,....! © 


Of course, in a roundabout way, using the “caret” (cf. VIII.2.6), certain 
formal constants will be interpreted, via (1), into members of M: 


(@)™ =(@)°=a — (cf. VIII.2.8) 


After all, MC M[G]. 

Under the interpretation of Lset,u in Mtg above, some names are interpreted 
as objects that are not in M. For example, we have seen circumstances under 
which an M-generic set G is not a member of M (VIII.1.9), yet there is a name 
for G in Lge, namely, T (VIL2.8). 

We will find the following notation convenient: 


VIII.3.1 Definition. For any M-generic G, Lse formula .4(x1, x2, ..., Xn),! 
and aj,...,4, in M we write 


GE.4@, &,..-,G) () 
or 
GE 41 a,%, a%,...,an@ J] 
as a shorthand for 
Em, (Gi, &, ...,Gn) (2) 


or 


Lon, 4lLa%,a2%,...,4n°] (cf. 1.5.17) 


+ This procedure justifies the name “interpretation” for the function x KH x®. 


? E.g., in VIIL4.10. 
8 Under the term “Lse; formula” we include formulas over Lge; that may contain defined symbols. 
However these formulas must have no M-symbols @, etc. 
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To a person living in Uy, (2), and hence also (1), above mean the same thing 
as (cf. VI.8.4) 
Ky, OMe a’, as. sh wae ] 

We will usually write the above using round brackets (argot). Similarly, one 
abuses notation slightly and writes 

Em, (al, af, ...,a9) (3) 
instead of (2). However, to the right of FE one normally expects to see a well- 
formed formula over our language, here Lset,y. Thus, a “real” object (from the 


structure SJtg) can appear in a formula only by its formal name, @, rather than 
by an informal name such as a, unless one writes in mixed mode using [.. . ]] 


brackets.‘ © 


VIIL.3.2 Definition (Weak Forcing). Let M be a CTM and $8 € M, while 


p € ||. For any Ls formula.4(x1, x2,...,X,) and a; € M we write 
piI-* 4@,@,...,%) (1) 
pronounced p weakly forces the sentence .4(@, Gz,...,@), to mean 


(WGC Bpn(G is M-generic A p € G implies G K.4@, &, 5%) 


Ideally one ought to use a subscript “(M, 98)” to the symbol “IF””, but such 
pedantry is almost never practised or needed. 


In conformity with the mixed-mode notation of VIII.3.1, we may also write 
instead of (1) 


pi-’. 4a, a2,..-,4n] (2) 


Note that since (1) is to be investigated within M, each a; is interpreted as 
a; in M, which leads to (2). One then abuses notation and uses round brackets 
in (2) (more on this below). 

Weak forcing is due to Feferman (1965). Intuitively, one can think of p 
as “a finite amount of information” that is sufficient to make good the claim 


“ 4(Q, @, ..., GQ) is true in G”* in all the infinite (generic) extensions of p.} In 
effect, p forces the truth of .4(q, a7, ..., G) in M[G]. Note that an inhabitant 


of M can write down (1), but it is unclear a priori just how he might verify 
it or refute it, for he has no knowledge, in general (cf. VIII.1.9), of generic 


1 This same apparent hairsplitting is what made us import constants in I.5.4 towards defining the 
Tarski semantics for first order languages. 

= In the jargon of VIII.3.1. 

8 Looking back to VIII.1.9, this distinction between finite and infinite “amounts of information” 
is aptly motivated. 
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sets except by (formal) name — all he knows on faith is that generic sets are 
objects found beyond the universe he lives in. Yet we will see in the next section 
that this apparent dependence of forcing on G-sets and knowledge of “things 
outside M” can be circumvented. 

We state here a few basic properties of I, all due to Cohen. Monotonicity 
(2) in VUI.3.3 below, and the definability and truth lemmata below, will be the 
ticket for the mathematician in M to do forcing within his universe. 


@m most cases below we write c or c% (depending on target structure) instead 
of ¢ in formulas. This notational simplification (argot) is achieved by using 
mixed-mode notation, but employing round brackets nevertheless. © 


VIII.3.3 Lemma (Cohen). Let M be a CTM and % € M, while p € |%B| and 
q € ||. For any sentences .4 & over Lse;,y the following hold: 


(1) Consistency: We cannot have both p|-+” 4 and p\l-* 74. 
(2) Monotonicity: p|-”Z and q < p imply q\F’.4. 
(3) pik” 4 A.B iff pik’ .4 and p\F’.Z. 


Proof. (1): We cannot have both Foy, .4 and Eon, 74. 
(2): Any M-generic G is a filter, and hence q € G implies p € G. 
(3): EN 46 N 2 iff EM @ and Em PB. 


VIII.3.4 Lemma (Definability Lemma). We fix a CTM M and a notion of 
forcing B € M. Forany.4(x1, ..., Xn) over Let, there isa. B(y, X1,..+,Xn) 
over Leget,m such that for all p € |\3B| and x1,...,X, in M, 

(pik? 40a, pace Xn)) 5 Gr de 3 HN 


holds in U,. 


ce Craniine the lemma, a being in M can verify the right hand side of the above 
equivalence (hence also the left) working in his world M with the unrelativized 


B (cf. V8.4). & 


The following lemma says that truth in tg can be certified by working with 
a finite approximation of G. 


VIIIL.3.5 Lemma (Truth Lemma). Let M be a CTM, and 8 € M be a notion 
of forcing. For any M-generic G © || and any formula .4(X) over Lget.m; 


for all x; € M, SOMES x9) <> (Ap € G)plk’ 4G) 


Ven. 


holds in Uy. 
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The last two lemmata are proved in Section VIII.5 with the help of the 
“original” concept of Cohen’s (strong) forcing. 


VIIL.4. Strong Forcing 


We define here Cohen’s strong forcing relation, a syntactic concept that does 
not refer to generic sets and therefore can be defined within U, or, indeed, 
within M. 

We will use the symbol “It” for strong forcing, without qualifying super- 
scripts or subscripts as to its “strength”. In fact it will be shown that for any 
sentence over Lset,w and p € ||, 


pik®.4 iff (p IF <4)" (1) 


The form of (1) immediately suggests that unlike I+” — this being helped by 
its semantical definition — | does not subscribe to the proposition that an even 
number of — symbols at the front of a formula can be dropped. 

We will attempt to motivate the definition of the version of |k we use here. 
This is the one in Shoenfield (1971) and is probably the user-friendliest in the 
literature (compare with the versions in Cohen (1963) and Kunen (1980)!). The 
reader is cautioned not to expect our motivational overture to unambiguously 
lead to a unique choice of definition of IF. Even the auxiliary relation x €, y 
introduced below can be defined in different ways (see, e.g., Shoenfield (1967 
vs. 1971)). 

The crucial concept in motivating the definition of |F is that by using it, in M, 
we can effect a syntactic approximation to truth in M[G], i.e., an approximation 
to what 


G EF 4(qa, +5 n) 
or, more generally (_4 is over Lge), 
GE.A4(x1,..-,Xn) 


means. 


We begin by introducing a finite version of x €g y: 


7 Not only are the various versions of strong forcing not created equal in terms of definitional 
complexity, but this is also true in terms of their behaviour. For example, in the version in Kunen 


(1980), rather than (1) above one proves pl" .4 iff (p Ika)". 
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VIIL.4.1 Definition. Let $8 be a PO set. With p € |%8| and quantification over 
|§8| we introduce the abbreviation 


x €,y standsfor (dg > p)(x,q) €y 


Suppose that 8 ¢€ M, where M is a CTM of ZF. Thus, x €, y is similar to 
x €g y. Ineach case, the membership in y (via €G or €,) is settled by looking 
at one appropriate finite piece of information that is part of G or p (recall that 
when qg > p, q contains less information than p — cf. VIII.1.3). In the case 
of €,, however, we even start with a finite amount of information, p, rather 
than G. 

This relation is not explicitly defined in Shoenfield (1971), but is used there 
in a crucial way to define pl x € y (see below). By contrast, in Shoenfield 
(1967) x €, y is explicitly introduced, but there it abbreviates something else, 
namely, (Vq < p)(x,q) € y;i.e., one looks at all finite extensions of p in order 
to settle whether x €, y. 

The above considerations complement our earlier comment that this moti- 
vational discussion will not deliver the definition of |k uniquely. In the end, the 
definition of |- is meant to make the following three lemmata and (1) above 
true. Any definition that works will do. 


In the four lemmata immediately below, M is an arbitrary set, possibly empty, 
whose sole purpose is to enrich Lge with a number of additional constants. It 
bears no relation to 8 in general. 


VIII.4.2 Lemma (Definability Lemma — Strong Forcing Version). Let M be 
a set, and 8 a PO set. 

For any.4(x1,..., Xn) over Lget.u there isan.4'(y, X1,...,Xn) over Lset,.m 
such that for all p € |%B| the following holds in Ua: 


(pik. Zen, or 2) <> 4'(p, x1, «+s Xn) 


Just like F and -, IF and IF” apply to everything to their right (they have 
lowest priority, hence maximum scope). This explains the brackets around 
pit. 4(a1,..., Xn) above. 


VIII.4.3 Lemma (Monotonicity Lemma — Strong Forcing Version). Let M 
be a set, and $8 a PO set. Assume that p € |B| and q € || with q < p. Then 
for any formula 4 (Xn) over Lset,m, the following holds in Ua: 


(p Ik. 2G)) = (4 Ik. 2G)) 


© 


© 
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VIlI.4.4 Lemma (Consistency Lemma). Let M be a set, 8 a PO set, 


and pé& ||. Then for any formula .4(x1,...,%n) over Leget.m and any 
X1,...,Xpy it is true in U, that we cannot have both p\l+.4@(x1,...,Xn) and 


pi-7A4(x1,..-, Xn): 
VIII.4.5 Lemma (Quasi-completeness Lemma). Let M be a set, and 8 a PO 


set. For any p &€ |B| and formula 4(X,) over Lget,m, the following holds in 
Uy: There is aq < p, depending on Xn, such that 


(4 Ik. 2G)) V (4 It. 4G,)) 
ete Remark. In other words, for any fixed x, the set 


|p € [BI : (piF.4G,)) Vv (pt 2G,))| 


is dense. If M isa CTM of ZF — P, x, isin M, and $8 € M, then relativizing to 
M yields 


({p |B]: (pi-.4G,)) Vv (pt 2G,))| is dense)” 


Hence, since being dense is absolute for such a CTM (Exercise VIII.3), 


|p € |B]: (pik. 4G)) Vv (pik2G)) | is a dense set in M. 


VIIL.4.7 Lemma (Truth Lemma - Strong Forcing Version). Let M be a 
CTM, and $B € M be a notion of forcing. For any M-generic G © |B| and any 
formula 4(X») over Lget,m, it is true in U, (provable with no more than the 
ZF axioms) that 


(Vin € M")(AMUG,..., x8) Gp € G)(plr.4Gn))") 


The definition of | will be given syntactically in the metatheory — 
specifically, within U, (using no more than the ZF axioms) — in VIII.4.8 below. 
We continue our work, pretending that we live in M, towards motivating the 
final definition. 

Defining p|+ U(@) is easy: we will let pl U(@) be true iff U(@) is true in 
M (which is so, by VIII.2.9, iff U(@) is true in M[G]). 


Syntactically, we will let the expression p|F U(x) — where x is a variable — 
just stand for U(x). 
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Next we search for a good definition for p|+ @ € b and p|+ a = b, or, more 
generally (with free variables), of plk x € y and plkx = y. 


So, what does Kn, @ € b mean? It means that a° € b® is true, that is,a eG b 
is true. This is, trivially, equivalent to (cf. 1.6.2) 


(Az\(z =a Az &G b) (1) 


To obtain a finite approximation of (1) we replace z = a and z €g b by finite 
approximations (we use strong forcing to approximate z = a). This leads to 
(Az)(z €p b A plFz =a), or, in the free variables version, 


pi-x ey means (dz)(z €p yA plkz =x) (2) 
More explicitly (by VIII.4.1), 
pi-xey means (Az)(Aqg > p)((z,q) ey A plFz =x) (3) 


We have already indicated that we will define what pl+ x 4 y means and 
then obtain the meaning for pl x = y indirectly. Thus, before we focus on 4, 


let us reflect on how to get plk =.4 from pl. in general. Unlike the case 
for , we must not set? 


pl--.4 iff —plt.4 


for it is conceivable that the p does not force the truth of .4, simply because it 
does not contain enough information to do so. Thus we need to know that no 
amount of additional finite information (any g such that g < p) will help to 
force .4. Then we can proclaim that p forces —.4. Thus, we will adopt 


pit7A4 iff Wq < p)-qlF.4 (4) 


With this settled, what does Eujc] @ # b mean? It means that a© = b® is 
false, and therefore one of the following cases obtains: 


(i) Ula) A U(b) Aa ¥ b (recall that U(a) iff U(a®)) 
(ii) U(a) AU) 
Gii) 3U(a) AU) 
(iv) Ga(@ ea? Az €b%V @EDe Az ¢a%)) 


¥ Actually, it turns out to be technically somewhat more convenient to look for a definition of 
pita ¥ B, viewing ¢ as the primary (in)equality predicate and = as a derived one. 

t The first — is formal, part of the formula —.4. The second is metamathematical. Some writers 
use pli .@ instead. 
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Condition (iv) is equivalent to 


G( Be CRE GE DY CEs PALES a)) (5) 


where we have written x ¢g y for —x €q y. A finite approximation of (5) that 
(as we will see) works is obtained by using €, for the positive cases and p |r 
for the negative cases. We are ready to summarize in a definition. 


VIII.4.8 Definition. (Shoenfield (1971).) We fix a PO set 8 = (P, <, 1) and 
a set M — possibly empty, a provider of constants — and define within U, (using 
no more than the ZF — P axioms) the symbol p |. 4, for any formula of Let. 
We do so by induction on formulas: 


(a) plK U(x) stands for U(x). 
(b) pl x € y is given by (2) (p. 535) (equivalently, (3), p. 535). 
(c) pl*+x # y stands for 


(U(x) AU) Ax Fy) V U(x) A TU (y)) Vv (U(X) A Uy) 
V G2(@ Ep xr plkz eV @EpyA plz x) 


(d) plik. 4 iff (Wq < p)-q\F.4. 
(e) plt.4V. iff plt.4 or plF.Z. 
(f) plik Ax). 4[x] iff @x)(plF.4[x)). 


VII.4.9 Remark. In (b) and (c) above an occurrence of g ku ¢ w is (in view 
of (d)) an abbreviation for 


(Wr <q)-rlku ew (6) 


while (since u = w is an abbreviation of ~u 4 w)q|- u = w isan abbreviation 
for 


(Wr <q)-rlkuA#Aw (7) 


Using (6) and (7), we can thus rewrite clauses (b) and (c) of the definition to 
read 


pltx ey abbreviates (dz)(q > p)((z,g) € yA (Wr < p)a>rlkz 4x) 
(8) 
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and 


pl+x #y _ abbreviates 
(UU) AUQ) AX y)V UC) A UO) V CU) A UO))V 
GQq= P((ag)exAWrsprrikzeyv © 


((z,g) €yA (Wr < p)vrlkze x) 


respectively. This now makes sense out of Definition VIII.4.8(b)-(c), since (8) 
and (9) constitute a simultaneous recursion in the sense of VI.2.40. 

More rigorously, then, what Definition VIII.4.8(b) and (c) really mean is that 
we employ the abbreviations 


for any p € |B, p\-x € y stands for In(p, x, y) = 0 (10) 
and 
for any p € |S], pix # y stands for Ne(p, x, y) = 0 (11) 


where the functions Apxy.In(p, x, y) and Apxy.Ne(p, x, y) (with right field 
{0, 1}) are defined in U, by the simultaneous recursion (8’) + (9’) below, 
mimicking (8) and (9): 


0 ifpe |PIA 
In(p, x,y) = (z)\Gq = pi(z.4g) € yA (Wr < p)Ne(r, z, x) = 1) 
1 otherwise (this includes the case p ¢ |B|) 
(8’) 


and 


0 ifp EIBIA(UBAUO)Ax y)V UB) AU) 
Vv (7U (x) A U(y)) 
Ne(p, x, y)= V (az)(g = p)[((z.g) €x A (Wr < p)ln(r, z, y) = 1) 
V ((z.g) €y A(Wr < p)ln(r, zx) = 1)]) 
1 otherwise 


(9) 


That the recursion (8’)—(9’) is legitimate follows from the following consid- 
erations: We note that (8’) implies z € dom(y) (by (z,qg) € y) and hence 
max(e(z), e(x)) < max(e(x), e(y)). Similarly, (9’) implies z € dom(x) (by 
(z,q) € x) or z € dom(y) (by (z, q) € y); thus max(p(z), p(y)) < max(p(x), 
p(y)) and max(p(z), e(x)) < max(p(x), e(y)) respectively. It follows that the 
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recursion is with respect to the relation 


(p,u, v)P(q,x,y) iff {p,q} S [PB] A max(p(w), p(v)) < max(p(x), e(y)) 


which has MC and is left-narrow (verify!) as required. 

We now recast the entire Definition VII.4.8 in arigorous light: (a) and (d)-(f) 
remain the same, giving meaning to the left hand side in terms of the right hand 
side. (b) and (c) are now (10) and (11) above. 


By the absoluteness results of Section VI.8 (in particular, absoluteness of 
ranks; cf. also VI.8.23 and VI.8.24), both In and Ne are absolute for any CTM 
M of ZF — P. Let then M be such a CTM with 8 € M. Relativizing to M, we 
have, for all a and b in M and p € ||, 


In“ (p, a, b) =In(p,a,b) and Ne™(p,a,b) = Ne(p, a, b) (12) 
or 


In“ =In[M and Ne” =Ne[M (13) 


© 


Proof of Lemma VIII.4.2. We do induction on formulas. A trivial preliminary 
remark is that “stands for” can be replaced by “<>”. For example, (10) can be 
rewritten as 


For any p € |B], (pix ey) = In(p, x,y) =0 
for if we replace the abbreviation pl x € y by what it abbreviates, we get a 
tautology. 
For the basis, .4 may be 
(i) U(x). We then take .4/(p, x) = U(x). 


(ii) x € y. Then use.4’(p, x, y) = In(p, x, y) = 0. 
(iii) x # y. Then use .4'(p, x, y) = Ne(p, x, y) =0. 


For the induction steps,..4 may be 


(I) —.#(x). Then we can let. 4'(p, X) = (Vq < p).#'(q, X). 
(I) .2 Vv @. Then we let. 4’ = .2'v @'. 
(II) (Ay).A(y, £). Then we set .4’(p, %) = (Ay). B"(p, y, ¥). 


VIIL.4.10 Corollary. Let M be a CTM of ZF—P, and BEM a notion of 
forcing. For any formula 4(Xn) over Lge there is a formula .2(y, Xp) over 
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Lget such that for all a,,...,@, in M and all p € |¥\, 


M 5 
Ku (ph 4@i..tm)) if Em ALP. and 


where IN = (M, U, €). 
Proof. Relativizing VIII.4.2 to M, we get 

(Vp € |B)(Vx, € M)((pl-- 4G, i > 6D, M1, +0, xn)") 
in particular, for any p € |8| and constants a7, ..., a, in M,' 


M 
(pIH.4G@i,...,%)) 4p, ... HIM 


Setting 2 = .#’ and rewriting the right hand side of the above in [...] 
notation, we are done. 


Caution. p|F U(x), plk x € y, and pl x ¥ y are absolute for CTMs, as noted 
in VIII.4.9. Thus, e.g., (plk x € y)™ is equivalent to pl+ x € y forall p, x, y 
in M. However, p|t..., in general, is not absolute. © 


Proof of Lemma VIII.4.3. We do induction on formulas, following Defini- 
tion VIII.4.8. Now, .4 may be one of 


(i) U(x): Then the result is immediate. 

(ii) x € y: Let plkx € y ands < p. By assumption we have (8) of Re- 
mark VIII.4.9. This remains valid if we replace the letter p by s throughout 
(transitivity of <). 

(iii) x ~ y: Exactly as in the previous case (using (9) of VIII.4.9) 


For the induction steps,..4 may be 


(I) —%: Let g < pand (vr < p)-rl+.%. Then (Wr < q)-rl-F.%, that is, 
q\+—.%. (The LH. was not used) 
(I) 2 v @: Exercise. 
(IID) (Ay). #(y, x): Let g < p and plt.4. That is, (dy)pl+.2(y, x). By the 
LH., Gy)g |F Ay, x). 


Proof of Lemma VIIL4.4, Fix x. Then p|lk = 4(X) means (Vq < p)q|--4(x). 
In particular, pl .4(x). 


1 We have opted here for the notation “.4/(...)”” rather than the awkward “(.4/)(...)”. 
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Proof of Lemma VIIL4.5. Fix x and p € ||. If pl/ = 4(x), then we are 
done with g = p. Else, there is ag < p such that g|+.4(X) by VIII.4.8 — the 
— -case. 


Proof of Lemma VIII.4.7. Fix an M-generic set G. We do induction on formulas, 
following Definition VIII.4.8. For the basis we do induction on max(p(x), p(y). 
Now, .4 may be one of 


(i) U(x): Trivial. Forcing = truth in this case. 
(li) x € y: 
—>: We fix x and y in M such that x° € y® is true. Thus, (4z)\(z = 
x AZ €q@ y) (cf. C1) of p. 535). Let c (auxiliary constant) be a z that 
works. By the I.H. (on max(e(x), p(y))), let p € G be such thatt 


pl-kc=x (a) 


Let also g € G such that (c, g) € y. There is anr € G that witnesses 
compatibility of p and g. Then r | c = x by VIII.4.3 and (a) above. 
Thus 


(az)€@q =r)(z,g) ey Arlkz =x) (b) 


—in short, Gr € G)rlkx € y+ 
<: Let x and y be in M and assume (b), with r € G. Let c work for z. 
Then q € G by filter properties; thus c €g y; hence 


cl ey? (d) 


Moreover, the LH. (on max(p(x), e(y))) implies that c@ = x°. Com- 
bining with (d) (via the Leibniz axiom), we get x° € y®. 
(iit) x + y: 
—>: We fix x and y in M such that x° 4 y® is true. We have cases: 

(a) U(x%) A U(y®). Then x = x© and y = y® (VIIL.2.9), and thus 
x # y. By VIII4.8, any p € || satisfies pl x ¥ y. Taking, for 
example, p = 1, we have such a p inG. 

(b) U(x) A -U(y®). Then (VIIL.2.9) U(x) A =U(y), and any 
P € |P| satisfies pl x 4 y (VIIL4.8). We conclude as in the 
previous case. 

(c) -U(x®) A U(y®). As above. 


¥ The LH. applies to atomic formulas, here c 4 x. However, it is all right to apply it to negated 
atomic formulas, here c = x, by the “—”’ case below. 
} By the remark following the proof of VIII.4.10, it is unnecessary to write (r IF x € y)™. 
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(d) =U(x°) A =U(y%). Thus, 
(az € dom(x) U dom(y)) 
(«° Ex? Azo ¢ y®%) (1) 
V (zo €yF rn 2% ¢ a) 
For the sake of argument, say it is the first of the two (V-)cases 


of (1) above that holds for some z. Then z €g x, i.e., for some 
peEG, 


(Z, p) Ex (2) 


Moreover, since p(z) < p(x) by (2), max(p(z), e(y)) < max(pe(x), 
p(y)); thus the I.H. implies, for some g € G, 


qitz¢y (3) 


Let r € G satisfy r < g andr < p. Then (2) yields z €, x, and 
(3) yields rl z ¢ y by VIIL-4.3. Thus r | x 4 y. The other case 
in (1) is entirely analogous. 
<: We fix p € G and x and y in M such that pl+ x # y. We have 

cases. 

(a) U(x)AU(y). By VIIL4.8, x 4 y holds. Since x = x° and y = y© 
(VIII.2.9), x° 4 y®% holds. 

(b’) U(x) AU (y). Then (VIII.2.9) U(x?) A =U (y®). Thus x9 4 y% 
holds. 

(c') =U(x) A U(y). As above. 

(d'‘) =U(x) A -U(y). By VUL4.8, 


(Az € dom(x) U dom(y)) 
(Gq > py(z,g) ex plrz¢€y) (4) 
V 2g = py(lz.g) € yA plkz €x)) 


For the sake of argument, say it is the first of the two (V-)cases 
of (4) above that holds for some z. Then z €g x, since g € G by 
filter properties; hence 


go Ex2 (5) 


p(z) < p(x) by z € dom(x) implies max(p(z), p(y)) < max(e(x), 
p(y)); hence, from pl z ¢ y and the L.H., 


a (6) 
(5) and (6) now yield x° # y®%. The other case in (4) is entirely 
analogous. 
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For the induction steps, with respect to formulas, .4 may be 
(Il) =A (x,): Fix x;,i = 1,...,nin M. By Remark VIII.4.6 and M-genericity 
of G, 


(Ap ¢ G)((p It. BGn))" V (pl ~BCn))") (7) 


—: Assume 72™I61(x6,..., x9). Let ge G make (7) true. Then 
(4 IK 7B En)", since the alternative and the I.H. would yield 
BMC XG... x), contradicting the assumption. 

<: Let p€G be such that (pIK ABE) Then (VIII.4.4) (pI 
pe \\M 
B (Xn) : 

Is it possible that, for some r € G, (r IF B Gy)? Well, if so, let 
s € G witness the compatibility of p and r. Then (VII.4.3) 


(Slt BGn))" A (sIF ABR)" 
contradicting VIII.4.4. Thus, 
(Wp € G)(p WAG)” 
or 
A(Ap € G)(plr.BGn))" 


By the L.H., this translates to “2™(C(xG, ..., x) is false in U,”. 
Hence —.A(xf,..., x0)MIC] is true. 
(I) .2 Vv @: Exercise. 
(III) (Ay). ¥(y, 3): Fix Z in M. 
—: Let 


(ay € MIG) A™G, x7, ...,42) (8) 
be true. Let y = a work above. Then for some b in M, a = b© and 
BN GS 08 i XZ) (9) 
is true. By the I.H., 
(p € G)(plk. BO, x1, -..,%n))" (10) 
is true; hence 
(dpe G)(plt Gy). By, x1, +5 Xn) (11) 
is true, by VIII.4.8 (4-case). 
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—: Let (11) hold for the given x. Then (10) holds by VIII.4.8 (4-case) 
for some b € M. Therefore, by the I.H., (9) holds. Thus we have (8). 


VIIL5. Strong vs. Weak Forcing 


We can now connect Ik and IF”. 


VIIL5.1 Theorem. Fix a CTM M anda PO set 8 € M. Then for any formula 
(Xn) over Lget,m, and all X, in M, (1) on p. 532 holds in Us. 


Proof. Fix x1,...,X, in M. We prove that 
_ _\M 
(pI”. AG.) oe (pK --.4G,)) (2) 
holds in Uy. 


<: Assume the right hand side of <, and let G C |$8| be M-generic 
and p € G (cf. VIII.1.8). By VIII4.7, =>. 4M'Fl(x, ..., x) is true; hence 
so is AMIEXe,...,x°). By Definition VIIL3.2, pl”. 4(x,), since G (with 
p € G) was an arbitrary generic set. 


—: We prove the contrapositive, so let 


_\M 
-( pik +76) (3) 
Thus, for some g < p, 
_ \M 
(alt 2G,)) (4) 


By VIIL.1.8, let G be M-generic and g € G. By VIII.4.7, (4) yields the truth of 


AAMAS, x9) 


By filter properties, p ¢ G. Definition VIII.3.2 then yields = DIFY 4(Xn )). 
One now obtains the Lemmata VIII.3.4 and VIII.3.5 at once: 


Proof of Lemma VIII_3.4. By VIIL5.1 and VIII.4.2 we can take 


Bp, %) = (pik 7. 4@)) 


Proof of Lemma VIII.3.5. <-: Pick any M-generic G. Let p € G, x; Gi = 
1,...,n)bein M, and p I+” A(x). By Definition VIIL3.2,, 4M? ..., x) 
holds. 
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—: Pick any M-generic G and x; (i=1,...,) in M. Assume now that 


AMEXG, ..., x8) holds. Then also (=. A(x@,...,x8))""! holds. By 
VIIL.4.7, 


(Ape GpIb. 40, 650): 


We are done by VIIL5.1. 


VIII.6. M[G] Is a CTM of ZFC If M Is 


Let M be a CIM for ZFC, ‘8 € M be a notion of forcing in M, and G C |p| 
be M-generic. We will show that M[G] is also a CTM of ZFC, indeed the 
C-smallest. 

So far we know that M[G] is countable and transitive (VIII.2.11) and is a 
subset of any CTM N such that M C N andG e€ N (VIII.2.5). It remains to 
see that indeed it is a model of ZFC. 


VIII.6.1 Lemma. For any x € M, p(x°) < p(x). 


Proof. We do €-induction. If U(x®), then also U(x); thus p(x°) = p(x) = 0 
VI.6.24). Next, assume that —~U(x°). Then 


ex)= 1 LU 00%] +1 


yG exe 


<( U Ho) 1 by LH. 


(ApeG)(y, p)ex 


< (U a) +1 since p(y) < p((y, p)) 


= p(x) 


VIII.6.2 Lemma. On” = On™!1, 


Proof. On” = {x € M: (x € On)”} = {x € M: x € On} = MN On, the 
second “=” by absoluteness of (x € On) for transitive classes. That is, On™ is 
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the set of all reali ordinals found inside M. To an inhabitant of M, On™ is a 
proper class, that is, 


On” ¢M (1) 


Indeed, On” is a (real) ordinal, for it contains just ordinals and is a transitive set 
(intersection of two transitive classes). If On” € M, then On” € ONN M = 
On™. It also happens to be the smallest (real) ordinal not in M: If a < On”, 
then a € On” = MN On. 

Similarly, transitivity of M[G] yields that On™!¢] = M[G]N On and that 
On™!Cl is the smallest ordinal not in M[G]. Since, trivially, On” ¢ OnE], 
that is, On” < On™!l, we will be done if we can show that 


On” ¢ M[G] (2) 


Well, if (2) is false, then On” = c° for some c € M, and hence (VIII.6.1) 
p(On™) < p(c). Since (p(c) € On)” is true (why?), we have p(c)” € On”, 
that is, ep(c) < On” (absoluteness of p), a contradiction (from p(On”) = 
On” + 1—cf. VI.6.21). 


VIII.6.3 Remark. By absoluteness of w and of natural numbers, n € On and 
@ € On relativize (for any transitive class 9%) ton € On™ and w € On™ 
respectively. In particular, 0,1, 2,...,@ are in both M and M[G]. © 


VIII.6.4 Theorem. The M[G]-relativizations of the ZFC axioms are true. 


Proof. 


(i) Urelement axioms: 
(a) Urelements are atomic: Let y € M[G]. We want the truth of 
(U(y) > -Gx)x € y)"""! Thatis, of UG) > AG € M[G)x € y, 
which is true even without the qualification (Ax)(x € M[G] A---). 
(b) Setofallatoms: We want {x : U(x)}“'¢l € M[G]. Thatis, (M[G]N 
{x : U(x)}) € M[G]. This is so by VUI.2.9 — whence M[G] M 
{x : U(x)} = MO {x : U(x)} — along with M C M[G] and the fact 
that M is aCTM, so that {x : U(x)}” € M. 
(ii) Extensionality: It holds because M[G] is transitive (VI.8.10). 
(iii) Separation: Leta € M andb = {x € a? :.4™l41(x)}, where .4(x) is 
over Lset,u. We let 


c = {(x, p) € dom(a) x |B]: plk” x Ear. 4A(x)} (1) 


+ Not relativized. 
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By VIIL3.4, the condition to the right of “:” is equivalent to a formula 
relativized to M. Moreover, dom(a) x |$B| € M, since M is a CTM of 
ZFC. Thus c € M. We now verify that b = c%, and we are done. Indeed: 
D: Let zec®%. Then z=x% for some x satisfying (x, p) €c, 
forsomepe€G. Then (1) yields that x¢€dom(a) and _ that 
x& €a& A. GMICI(¢%) is true by VIII.3.5. This says x° € b. 
Cc: Letz eb. Thenz € a®; therefore z = x° for some x € dom(a). 
The second part of the entrance requirement in b yields the truth of 
AOMIG)(y%), Thus, x € dom(a) A x® € a& A -4MIGFI(x2) is true. 
By VIII.3.5, x € dom(a) A pl-” x € a A_4(x) is true for some 
pé€G C ||. Thus, (x, p) € c by (1). But then x Eg c; hence 
x® ec®%, 
The case with parameters, .4(x, y), presents no additional difficulties. 
(iv) Pairing: By VIIL.2.10. 
(v) Union: We want to prove that for any a? € M[G] a set b° € M[G] 
exists such that ((_) is absolute for transitive classes — cf. Section VI.8) 


a® Cc be 


We concentrate on the case where —U(a®), for otherwise the result is 
trivial by absoluteness of @ for transitive classes (take b° = 9). 
Now, analyzing the issue inside U,, we find that 


Ja® = (x? : Gy € dom@a))x? € y%} (2) 


But then, taking b = |) dom(a) is what we want: Indeed, first, as M is 
a CTM for ZFC, and |) and dom are absolute, we have b € M; hence 
b° € M[G]. Moreover, by (2), if z € J a®, then z has the form x° where, 
for some y € dom(a), x Eq y. Say (x, p) € y for some p € |B|. Thus 
(x, p) € b; hence x% € b®. 

(vi) Foundation: Holds in any class (VI.8.11). 

(vii) Collection: Again we look into the parameterless situation. Suppose 
that we know that 


(Vx € a°)(Ay € M[G]).4™"l(x, y) is true (3) 


where .4 is over Leet, y and a € M. We want to show that a set b € M 
exists such that 


(Wx € a®)(ay € b®). 4!'l(x, y) is true (4) 


Since collection is true in M, the following holds in M (we have implicitly 
invoked VIII.3.4 to obtain a formula, relativized to M, equivalent to 
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“pIFY A(z, uy”): 
(V(z, p) € dom(a) x |BI)Gu € M)(pI-" AC, u)) 

> (AW € M)\(V(z, p) € dom(a) x |8])(Gu € W)plF’ 4(z, u) 


Or, moving (AW) to the front and asserting it to be a set (permissible 
by IIL8.4), 


there is a set W in M such that 
(V(z, p) € dom(a) x |B|)Gu € M)(pI+” Ag, u)) (5) 
=> (V(z, p) € dom(a) x |B|)(4u € W)plF” -4(z, u) 


Now fix a W € M that verifies (5), and let b = W x {1}. Clearly, by clos- 
ure of M under x and pairs, b € M. We will show that b° works for (4). 
First off, an easy calculation (similar to that in the proof of VHI.2.8) gives 


b& = {y°: ye W} (6) 


Towards (4), let x € a®. Then x = v® for some v € M and, moreover, 
v €g a. Therefore 


v € dom(a) (7) 


By (3) fix a y € M[G] that makes. 4™'@l(y%, y) true. Since y = f° for 
some t € M, we have the truth of .4”!9l(y°, ¢°); hence, for said v and 
t, and some p € G (by VIII.3.5), 


pik” 4(v, 1) (8) 


By (7) and (8), using z = v andu =f in (5), we have satisfied an instance of 
the hypothesis of (5). Thus, there is some c € W such that p|F”.4(v, c). 
By the truth lemma (recall that the p we are talking about is in G), 


AME] y®  c%) is true 


Moreover, c& € b© by (6). Thus c® will do as an instance of y in (4). 
(viii) Power set: Let a € M. We want to show that for some b € M, 


{x € M[G]: x Ca%} Cc be (9) 


It will turn out that b = P(dom(a) x |§8|)” x {1} works. First off, since 
M satisfies ZFC, b € M. The same type of calculation we have done in 
the collection case yields 


b& = {z% sz € P(dom(a) x |PI)"} (10) 
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Let now x € M[G] A x C a®. We show that x € b°. By hypothesis, 
x = c® for some c € M. We form 


d ={(z, p) € dom(a) x |B] : plik” z Ec} (11) 


By VIIL.3.4, d € M, since separation is true in M. We now see that d is 
a “better” name for x above — i.e., for c@ — than c is, because, using the 
name d, it is easy to show that d G € b°. We have two claims here: First, 
c& = d®, 

Cc: Let y €c%. Then y = 2° for some z € dom(a) (recall that c% C 
a°). By the truth lemma, there is a p € G such that pl+” z € c; 
hence, by (11), (z, p) € d. Hence, z° € d® (for z €g d). 

D>: Let y €d®%. Then y = 2% for some z € M, and there isa p € G 
such that (z, p) € d. By (11), z € dom(a) and p|F” z € c. Remem- 
bering that p € G, we get z° € c® by VIII.3.5. 

Second, we show that d©@ € b@, which concludes our task. But this is so 
by (10), since d € P(dom(a) x ||)”. 
(ix) Infinity: we MC M[G]. 

(x) AC: Itis convenient to use the version of AC given by Corollary VI.5.53. 
Fix then a set x € M[G]. For some set a € M, x = a®%. By the corollary, 
let f = {(B, f(B)) : 6 € a} in M such that dom(a) C ran(/) — possible 
by AC™. Recalling VIII.2.10 and VIII.2.6, we let 


F = {{B1B, £2) x} x {Beal xy (2) 


Now F ¢€ M. What is F°? Using VIII.2.10 and VIII.2.6, we calculate 
(as we did to obtain (6) above) 


FS = {({6, (8, fB)} x (} x ()° : Bea} 
= {{6.(6, £(B)%)} : Bea} 
= {(6. £8)°) Bal 


Thus F® is a function in M[G] with domain @ (VIII.6.2). If y% €a®, 
then y € dom(a); hence y = f(6) for some 6 € @ by choice of f, and 
therefore y° = f()°. Thus a C ran(F°). 


We have been repetitiously insisting throughout this chapter that a being in M 
be empowered to “force things to happen” in M[G] for any M-generic G. The 
proof of the above theorem clarifies the meaning of that intention and shows 
that it is wise and feasible: A being in M knows that ZFC is true in M. We 
have ensured that he can use this knowledge and the p|-F”.... construct, which 
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is expressible in M, to force the truth of the ZFC axioms in a world (M[G]) 
that, for him, is “imaginary” or unreachable. 

Actually he can do more, and this is the subject of the next section: By 
choosing the notion of forcing 58 in M appropriately, he can force all sorts of 
weird things to happen in the imaginary world M[G], such as 2*° = &x,. And, 
of course, he can force the truth of =CH. 

Finally, we note that the name “generic” for the sets G (for any fixed 5B) 
is apt. They all “code the same information’, i.e., it does not matter which 
particular M-generic G C |$8| we choose. If M is a CTM for ZFC, then M[G] 
is a CTM for ZFC. Moreover, if a $8 forces some additional properties (such as 
—=CH) to be true in M[G], this is so for any M-generic G C |B]. 


VIIL7. Applications 


We conclude our lectures by presenting in this section elementary applications 
of forcing. They all are based on PO sets of finite functions. We recall from 
Section VI.8 that finiteness is absolute for CTMs of ZFC and therefore an 
inhabitant of such a CTM proclaims a set “finite” exactly when such a set is 
“really” finite. We will benefit from widening the scope of Examples VIII.1.3, 
VIIL.1.6, and VHI.1.9. 


VIII.7.1 Example. Leti §(a, b) = (F(a, b), <, 1) be a PO set defined as fol- 
lows in terms of sets a and b where a is infinite and b 4 @:! 


F(a, b) ={p|p:a-— bisa finite function} 


1 is the function @ : a > b with empty domain. p < q will mean p D q 
(reverse inclusion). Let M be a CTM and ¥ € M. Then absoluteness of pair- 
ing and finiteness allows the preceding definition to take place inside M with 
the same result as if it were effected in Uy. Let G C F be M-generic. Then 
f = UG isa function a > 5 that is total and onto. That it is a function 
follows from the compatibility of p and g in G as in VIII.1.9. For totalness 
one observes (as in VIII.1.9) that the set Dj ={p¢€F: p(i){} is dense in 
M for all i €a. For ontoness we employ the dense sets R;, for i ¢ b, where 
R; ={p¢€F:i€ ran(p)}. Indeed, let g € F. If q(i) J, then g € D;. Otherwise, 


D2 qUti, J} <4 


* No connection between this “@” and Gédel operations of Section VI.9. 
t We have used “{p|...}” rather than our usual “{p:...}” because a “p : a...” follows a few 
symbols away. 
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where we have picked j €b arbitrarily (b #9). Similarly, if i € ran(g), then 
q € Rj, else 


Ria qU{Ui} <¢@ 


where we have picked j € a — dom(q) (a is infinite in M). 

Thus, G1 Dj 4 @ 4 GOR; foralli € a andall j € b. That is, for all these 
i, j there are finite subfunctions of f — p and q — with p(i) | and j € ran(q). 
That is, we have verified totalness and ontoness. 

As in VIII.1.9, G ¢ M if b has two or more elements. Indeed, under the 
present circumstances, if G € M, then so is F — G, and we get a contradiction 
as follows: First, F — G is dense. Indeed, let g € F. Pick i € a — dom(q) and 
j,m distinct members of b. Then 


q Uti, J)}Lq U {(i, m)} 


Thus one of these extensions of g must be in F — G. Having verified the density 
of this latter set, we now have (F — G)NG £ @. 


It is useful to rephrase the fact that G ¢ M: M Cc M[G]. 


Absoluteness results of Sections VI.8 and VI.9 (in particular VI.9.20) show that 
the function a +> Fy, of VI.9.2 is absolute for any CTM of ZF as long as we 
take N of Ly in M. Thus, if MW is such a CTM, then 


LY = {F,:a@€ On)” ={F,:a€ On} (1) 


If now M is aCTM of ZFC and we take a = w and b = 2 in Example VIII.7.1, 
we have, for any M-generic G, that M[G] is a CTM of ZFC (Theorem VIII.6.4) 
and, moreover, 


LIS! — (Fs € On} = (F, sw € OnlG) (2) 
Since On” = On™!©! (by VIII.6.2), (1) and (2) yield 
LA = 


or, in words, M and M[G] have the same constructible sets. 

Now, by the conclusion of the preceding example and in view of what we have 
just remarked, any x € M[G]— M isaset(M and M[G] have the same atoms; 
cf. VIII.2.9) that is not constructible in M[G]. Thus, on the assumption that M 
is a model of ZFC, we now have a model of ZFC + (V # L), namely M[G]. 
More specifically, it turns out that not only G ¢ M, but also UG ¢ M (see 
Exercise VHI.2). This function — viewed as a characteristic function — defines 
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a subset of w. This subset is not in M, and hence is not constructible in M[G]. 
For the record we state all this as follows: © 


VIII.7.2 Proposition (Cohen). [fis consistent with ZFC that non-constructible 
sets exist. In fact, it is consistent with ZFC that a non-constructible subset of w 
exists. 


VIIIL.7.3 Example (Collapsing Cardinals). Once more we look at Exam- 
ple VIII.7.1. This time we fix a CTM of ZFC, M, and take a = w and b = Se. 


Of course, nit is the smallest ordinal a > win M for which there is no 1-1 
correspondence f :w — a in M. © 


Let G be M-generic with respect to the PO set §(w, 8/”), and consider 
g = UG. We know that g € M[G]. We also know that g : w — a is total 
and onto in M[G] (recall that ordinals are absolute, and M and M[G] have 
the same ordinals). Thus, a is not an uncountable cardinal in M[G]; it is just 
an at most countable ordinal (in view of g). We say that the cardinal 81” was 
collapsed as we passed from M to its generic extension M[G]. Therefore 8!" is 
not an uncountable cardinal in M[G], that is, xi’ < gel) in M[G]. Of course, 
all along, 81” is just that a above. 


This phenomenon of cardinals collapsing — a witness to the fact that being a 
cardinal is not absolute for CTMs — is annoying because it causes more work 
towards proving the relative consistency of ~CH. 


Of course, at most countable cardinals do not collapse, by absoluteness of w 
and of finite ordinals. Moreover, going backwards, all cardinals are preserved. 


VIII.7.4 Proposition. Let a be an ordinal in M[G] such that (a is a 
cardinal)™'©!, Then (a is a cardinal). 


Proof. wis an ordinal in M by VIII.6.2. Suppose that the conclusion is false, and 
let 8 <a and f : 8 > a bea 1-1 correspondence in M. Now “f is a 1-1 corres- 
pondence” is absolute for transitive models of ZF, and 6 and f are in M[G]. 
This contradicts that @ is a cardinal in M[G]. 


VIIL7.5 Example (Towards the Relative Consistency of —CH). We turn 
once again to Example VIII.7.1. This time we fix a CTM of ZFC, M, and take 
an §(a, b) € M witha = @ x x and b = 2, where x is an uncountable cardinal 
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in M, or 

(k is an uncountable cardinal)” (1) 
Let G be M-generic and form M[G]. We know that if we let f = JG, then 

f :@ xX k — 2is total and onto (2) 
and 

fEéEM by Exercise VHI.2 (3) 


As a matter of fact, a tb An. f(n, a)! is 1-1 on x: Indeed, for any a ¥ Bf ink 
consider the set in M, 


pe [pe F(w x k, 2): An € @)(p(n,a) | A 
p(n, B) | Ap(n, «) # p(n, B))| 


D is dense, as we can extend any g € F(w x x, 2) by adding the triples 
((n, a), 0) and ((n, B), 1) to g, for some n such that g(n, a) t+ and g(n, B) t 
(there are plenty of such n, by finiteness of g). Now GNM D SF G translates to 
f(a, a) 4 f(n, B) for some n € w; hence An. f(n, a) # An. f(n, B). 

Since the function f isin M[G], soisa > An. f(n, a). The latter being 1-1 
and total (on «), it establishes 


M[G] 
(Cara¢2) > Card(c)) (4) 
or 
(2%oyMIG] > Cardi) (5) 


Of course, the ordinal x is an ordinal to inhabitants of both worlds, M and 
M[G], but it would be reckless to assume, a priori, that « is a cardinal in 
M[G] just by virtue of being so in M — after all, we saw that cardinals may 
collapse. Hence our conservatism in using “Card™!l(«)” rather than just “x” 
in (4) and (5) above. 

Now, outside the context of a CTM, to deny CH (that says 28° = §)) is to 
manage to get 2%° > &1, in other words 


DP0'S 85 


In view of (5) and the above (relativized to M[G]) it would then suffice to 
take « = 8% and prove that uncountable cardinals, at least the two particular 
ones 8{/ and 84, are preserved as we pass from M to a generic extension (with 


} Or Aw.(An. f(n, @)). 
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respect to the present §(a, b)) M[G], that is, we would like to show xi’ = see 
and xi! = gel, 

We will do that through a sequence of lemmata. 


Pause. If 8% collapses, then we are in trouble: (5) does nothing for us, because 
the ordinal 8¥/ is countable in M[G]. If 81” collapses, even if 84/ might not, 
we are still in trouble, for ni! is not the second infinite cardinal in M[G] (that 
is, ee) now that 8/7 has collapsed. 


VIII.7.6 Remark. Since for each a < k, An. f(n, a) codes a real number in 
[0, 1] Gin binary notation), we say that An. f(n, a) is a Cohen generic real. Thus 
we have added (these objects are new, by Exercise VIII.2) « generic reals to 
the ground model M. Intuitively, this set of reals turns out to be so huge that, 
in M[G], it has cardinality large enough to allow some cardinalities below it, 
but above w. 


VIII.7.7 Definition (The «-Antichain Condition). Let « be an infinite cardi- 
nal. A PO set $8 = (P, <, 1) has the x-antichain condition, for short «-a.c., if 
every antichain A C P has cardinality < k. 


© In much of the literature the « -antichain condition is called the «-chain condition 
or «-c.c. In particular, when « = &; one then speaks of the countable chain 
condition or C.c.c. 


VIIIL.7.8 Lemma. Let M be a CTM of ZFC, and 8 = (P, <,1) a PO set in 
M that has «-a.c. in M.' Assume that x is a regular uncountable cardinal in 
M — that is, (@ <KAK IS regular)" holds. Then k is also a cardinal in M[G] 
for any M-generic G C P. 


Proof. Suppose instead that hypotheses hold, yet there is an M-generic G and an 
onto function in M[G], f :a@— «, where a <x in M[G]. That is, the ordinal « 
is not a cardinal in M[G]. By VIII.6.2, a is an ordinal in M as well. 


There is a formula .4(x, y, z) of Lge that says “x : y > z is an onto func- 
tion”. Thus, if t € M is such that f = t®, then, for some D € G (we fix one 
such p), VIII.3.5 implies 


pit’ A(t, &, R) 


+ The “in M” cannot be emphasized enough. Since M is countable, a resident of U, trivially sees 
that every antichain in M is countable. 


© 


© 
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In words, we have the following sentence true in M: 
pit’ t:& > & is onto (1) 


where the*-function is that of VIII.2.6. 


For every B < a we let 
Bp = ly <«:@q < p)giF" (6) = 9) (2) 


Bg € M for all B < a, since, by the definability lemma, the expression to the 
right of “:” is a formula relativized to M. We next majorize the cardinality in 
M of the Bg, i.e., estimate (“from above”) Card” (Bz). To this end, let us pick 
for each y € Bg one q that works in (2) above. We denote this g by qy. 

Assume now that y 4 6 are both in Bg. We will argue that g, 1s: If not, 
let r < g, andr < qs. Now, by definition of the symbol “q,,”, ¢, IF” tp) =? 
and qs |+” t(B) = 4; hence, by monotonicity (VIIL3.3(2)), r lk” 4(8) = 7 and 
rik’ 4(B) = 5. Let G’ 5 r be some M-generic set (cf. VIII.1.8). The truth 
lemma yields y = t?'(B) = 6 in M[G’] (recall that ordinals are preserved, and 
both y and 6 are in M, for x is). This is a contradiction. 


Pause. While G’#G in Uy in general, and the same is true of t@ versus 
t& = f, these objects — G’ and t© — were just intermediate agents towards 
deriving the contradiction y = 6. 


Thus, in M, Bg maps 1-1 into some antichain C that contains the q, objects 
for the various y € Bg. Therefore, Card(Bg) < Card(C) < « is true in Mt the 
““<” contributed by the x-a.c. of $B in M. Since « is regular in M, the following 
is true in M by VIL6.11: 


Card ( U Ba) <K (3) 
B<a 
We will next contradict (3) by proving, in M, 
«CJ Bz (4) 
B<a 


This shows that the assumption that we have an a and f in M[G] with the 
stated properties is untenable, thus proving the lemma. 

Towards (4), let us argue in M, and let y < k (ie., y € «). Since f is onto 
« (from a — this happens in M[G]), let 6 < a such that f(6) = y. Thus, by 


+ Written without the M-superscript, since we have said “is true in M” (cf. VL8.4). 
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the truth lemma, some q € G satisfies 
qi-” (B) =9 (5) 


We will be done if we can say “without loss of generality, g < p” for the p we 
have fixed at the outset of our proof (cf. (1)), for then y € Bg by (2). To settle 
the phrase in quotes, let r € G be such thatr < g andr < p —it exists because 
both q and p are in G. Then r |+” (8) = ? by monotonicity (and the fact that 
q works). 


VIII.7.9 Definition. Let M be a CTM of ZFC, and $B = (P, <, 1) a PO set in 
M. We say that 58 preserves cardinals just in case for every M-generic G C P 
and every ordinal a of M (i.e., a € On”), if w is a cardinal in M, then it is also 
a cardinal in M[G]. 


By absoluteness of w and below, finite or countable cardinals are always 
preserved (forward, from M to M[G]). By VIII.7.4 cardinals are also preserved 
backwards. 

Now, Lemma VIII.7.8 gives a sufficient condition for the preservation of all 
regular cardinals (in M) above x (clearly, if $8 has the x-a.c. in M, it also has 
the A-a.c. in M for all cardinals 4 > «). The following strengthens all this a bit, 
by dropping the qualification regular. 


VIII.7.10 Corollary. Let M be a CTM of ZFC, and % = (P, <, 1) a PO set 
in M that has the &,-a.c. in M. Then 8 preserves cardinals. 


Proof. We only worry about what happens beyond w. By the remarks above, 
ifk = Eee ,» a successor cardinal, then x is preserved, since it is regular in M 
(see VII.6.12). Suppose now that « = Ru and Lim(q), that is, a limit cardinal.‘ 
Thus « =U,_, 84’. By Remark VIL4.24(2), « =U eg Spi1 and all 84! are 
preserved. 


Our next task is to show that the particular PO set of Example VIII.7.5 has 
the &1-a.c. (or c.c.c. in the alternative terminology). We will need a definition 
and two more lemmata. 


VIII.7.11 Definition (A-Systems). A family of sets A is called a A-system, or 
a quasi-disjoint family, provided that there is a set r, the root of A, such that 
forany twoa Abin A,aNb=r. 


+ Lim(q) is absolute for CTMs. 
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The name “A-system” is suggested by the shape of a quasi-disjoint family 
as sketched below (the members of the family depicted below are r Ua, r Ub, 
r Uc, etc.): 


VIII.7.12 Lemma (A-System Lemma). /f the set A is an uncountable family 
of finite sets, then there is an uncountable B © A that is a A-system. 


Proof. This is argued in ZFC (or in Uy). 

First off, for each n € w, let C, = {X € A: Card(X) =n}. There must be an 
n € such that C,, is uncountable; otherwise Card(A) < Xo, since A = L), ein 
(this uses AC; see VII.2.13). Thus, without loss of generality, we assume that 
there is some fixed n € w such that, for all X € A, Card(X) = n.i Let us then 
prove the lemma by induction on n. 


For the basis, n = 1, it suffices to take B = A andr = 9. 
We proceed to the n + | case, based on the obvious I.H. 


Case 1. There is ana such that S = {X € A: a € X}is uncountable. By the 
LH., let D be an uncountable quasi-disjoint subfamily of {X — {a} : X € S} 
with root r. Then B = {X U {a} : X € D} with root r U {a} is what we want. 


Case 2. There is no sucha as above. We then define by recursion (on ordinals 
< &,) a transfinite sequence in A: 


Xo = some arbitrary set in A 
Xq = some arbitrary set in A that is disjoint from U Xp 
B<a 
Assuming that the recursion above is legitimate, then {X,_ : a < &;} is un- 
countable and quasi-disjoint. Indeed, the X,, are pairwise disjoint, so that r = J 
works. 


+ That is, we work with an uncountable C,, call it “A”, and discard the original A. 
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But why is the second step in the recursion possible for any a < &,? The 
set Y = {Y CA: VY E{Xg: 8 < a}} is uncountable; otherwise 


Card(A) < No(that is, Card(Y)) +. Card{X, : B < a} 
= &€o +. Card(a) since B +> X¢z is 1-1 and total on a 
= No +e No by Card(a) <a < 
— Xo 


We will be done if we can argue that at least one Y € ¥Y is disjoint from all 
Xx, B <a. Any such Y can then be chosen to be Xq. 


Suppose instead that every Y € Y intersects J g<a Xp- Then there is a 
Bo < @ such that Xz, Y # @ for uncountably many among the Y « Y — 
otherwise, 7 is a countable union of countable sets ©, = {Ye Yi XpNY F 
6}, for B < qa. Fixing attention on that Bo, we prove that some a € Xg, is in 
uncountably many Y, contradicting the case we are arguing under. Well, if not, 
let for eacha € Xg, 


Wi={YeY:aey} 


Each Y,, is countable; hence (X 4, being finite) so is U,< Xp WY. But this union 
= 0 
is the set of 7-sets that Xg, intersects, and that is uncountable. 


By the concluding remarks of Example VIII.7.5, all that remains to be done 
is the following lemma: 


VIHI.7.13 Lemma. Let M be a CTM of ZFC, and §(a, b) = (F(a, b), <, 1) 
a PO set in M, where a = w x ni! and b = 2. Then §(a, b) has the &-a.c. 
(or c.c.c.) in M. 


Proof. The argument is carried out inside M. 


F(a, b) is uncountable. Arguing by contradiction, let A C F(a, b) be an 
uncountable antichain. The set 


B = {dom(p): p € A} (1) 


is also uncountable. If not, A € Usen(P € F(a, b) : dom(p) = s}, acountable 
set, since for each finite s C w x 8% the cardinality of °2 is finite (= 24), 
and thus {p € F(a, b) : dom(p) = s} is finite. Let D C B be an uncountable 
A-system of root r, and set Ap = {p € A: dom(p) € D}. This is uncountable 
due to the onto map p +> dom(p). 

Now, {pr : p € Ap} is finite, hence there are plenty of p and gq in Ap, indeed 
uncountably many, with p 4 qg and pfr =q[r. But then p and gq are compatible, 
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since dom(p) and dom(q) are in D, and therefore dom(p) N dom(q) = r. But 


also pq, since both are in A. 


For the record, we now have 


VIII.7.14 Corollary (Cohen). With M and §(a, b) as above, if G is any M- 
generic set, then (~CH)™'©1 is true. 


Thus, a model of ZFC leads to a model of ZFC + —=CH. 


VIIL1. 


VIIL.2. 


VIII.3. 


VIIL4. 
VIIIL.5. 


VIII.6. 


VIIL8. Exercises 


In the definition of generic sets (VIII.1.7) we have required G to 
be a filter definitionally. Prove that in the presence of the density 
requirement we get that G is a filter for free, relaxing requirement (2) 
in the definition of a filter (VIII.1.4) as follows: We only ask that any 
two p and q in G be compatible (without asking for a witness in G). 
(Hint. Fix p and gq in G. It helps to prove that the following set is 
dense: {r € |B) :rlLpvrlqav(r < pAr <q)}.) 

Refer to Example VIII.7.1, and take a =w and b=2. We have seen 
that if M is a CTM (of, say, ZF) with §(w,2)¢ M and G is any 
M-generic set, then G ¢ M. We also know that f =|) G is a function 
and f ¢ M[G]. Prove that f ¢ M. 

(Hint. Let g : @ > 2 bein M. With the help of the set {p € F(a, 2): 
(An € w)(p(n) | A p(n) F g(n)}, prove that f # g.) 


If M is atransitive model of ZF — P, then the following are absolute for 
M, where we write 7;, i = 1, 2, 3, for the ith projection of (x, y, z): 
(a) P,.(A), where P,,(A) = {x : x C A A x Is finite}. 

(b) x isa PO set. 

(c) xisaPOsetAy em (x)AzZEmx)A ylz. 

(d) x isaPO set Ay C m(x) A7AU(y) A y is open. 

(e) x isaPO set Ay C (x) A -U(y) A y is dense. 

(f) x isa PO set Ay C (x) AAU (y) A AU (Zz) A y is z-generic. 
Prove that AxG.x° is absolute for transitive models of ZF — P. 


Prove that Ax58.% of VIII.2.6 is absolute for transitive models of 
ZF — P. 


Provide all the necessary details that show In and Ne are absolute for 
any transitive model of ZF — P. 


VIi.7. 


VIIL.8. 


VIIL9. 


VITI.10. 
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Chose an M,a PO set $B in M, and G C M (not necessarily generic). 
Show that (x U y)° = x Uy® for all x and y in M. 


Simplify the expressions, giving your answer in terms of |t or IF" 
(that is, F should not figure in the final answer). 

(a) pI" 74 

(b) pI-”® .4V.B 

(c) pi” (Wx). A(x, ¥) 

(d) plHY x). 4(x, ¥) 

(e) pIL.4A.B 

(f) pF (x). A(x, 3) 

Let M be aCTM of ZFC, % aPO setin M, p € |B], and. 4 a sentence 
over Lset,y. Assume that D = {qg € || : gq lH” 4} is dense below 
p, which means that for every r < p, (<(r)) A D F GY. Prove that 
pi-’ 4. 

Given sentences .4,,...,.4,,.@8 over Lset,u, for some CTM of 
ZFC, M,andaPO set in M. Prove thatifwehave.4,,...,.4, zc 
# and plk”.4;,@ =1,...,n, p € |BI), then we also have pl-t” .#. 
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List of Symbols 


[f(),..., f(a — 1], 251 

A x B, 189 

x (first projection of pairing 
function), 186 

6 (second projection of pairing 
function), 186 

48,79 

r+.4, 49 

Mp (adjacency map of P), 269 

Re, 465 

1, 516 

f(r) = O(g(n)) (“big-O” notation), 
283 

(da € M)..., 63 


O(Y1, +++ Yel), 20 

“4[y1, ters Yel, 20 

m, 458 

a-. 6,472 

Card([];<; &), 485 

at, 465 

Vier ti, 484 

kK, 458 

Card(x), 458 

CI(R), 495 

ra(P) (reflexive closure of 
P:A— A), 262 


s(P) (symmetric closure of P), 262 


t(P) (transitive closure of P), 262 

P* (transitive closure of P (alternate 
notation)), 232, 262 

P* (reflexive-transitive closure of P), 
269 

Cn, 458 

cf(a), 478 

P oS (relational composition), 253 

f * g, 252 

a’, 54 

D(X) (set of sets definable from X), 
223 

S(a) | (S is defined at a), 198 

Ao, 101, 380 

Axa: A= A (diagonal relation on 
A), 257 

—, 141 

<y, 413 

X+Y,412 

<,, 413 

dom (domain of a relation), 195 

0,111, 131 

A~ B,431 

A B,431 

IF“, 530 

IF, 536 

F3, 80 
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f: Awa B,26 N, 12 
f7,54 (Xn), 185 
glb(B), 347 (X1,-+-++;Xn) (n-vector or n-tuple), 
1, : A= A (identity relation on A), 185 
257 (x), 185 
SEX] (image of X under S), 196 N, 15 
S~'[X] Gnverse image of X under S), _N, 112, 233 
196 o, 112, 234 


S(c), 196 

7,55 

pq, 520 

inf(B), 67, 347 

bTa(T a relation; it means 
(a,b) € T), 194 

Z, 113 

(Ves Ai, 142 

aie Aj, 142 

Maci Au. 193 

nN, 141 

(A (intersection of a class A), 
153 

Z, 15 

(iz), 74 

K-a.c., 553 

K-C.C., 553 

Lim (a), 342 

, 397 

LON), 55 

un, 397 

A (used in A-notation), 202 

+> (alternative to A-notation), 203 

T, 438 

T™, 438 

<, 336 

Ly (the constructible universe), 284 

A, 35 

A, 39 

lub(B), 346 

a — 1, 342 


n,m,l,i, j, k (natural number 
variables), 235 

IK, 61 

4, 61 

¢, 116 

n, 88 

On, 332 

OP, 188 

< (abstract symbol for order), 284 

Ord, 331 

ord(x), 397 

a, 374 

~,317 

< (abstract symbol for reflexive 
order), 285 

Usea({B} x Xp), 135, 412 

a? 423 

a+ 1,340 

(a, b) (ordered pair), 186 

(A, <), 286 

~<, 462 

< (a fixed order of logical and 
nonlogical symbols), 222, 223 

=<, 461 

pr (the predecessor function on @), 
239 

P7,54 

A”, 192 

A, x +++ x A,, 192 

pe (Oth power of a relation), 257 

P" (positive power of a relation), 256 
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Teer Fa, 207 
x 1, Ai, 192 
x i<i<y Air 192 
A, 208 
A!, 67 
A/P,277 
Ry(q@), 359 
ran (range of a relation), 195 
p, 369 
rk, 369 
Q, 113 
Q, 15 
R, 15 
R, 113 
S—! (inverse of S), 196 
, 5 
T | A (restrict inputs of T to be 
in A), 198 
T {| A (alternate notation for “’’), 
199 
T | A (that is, TM A’), 198 
S(x), 340 
string 
A, 13 
=, 13 
Cc, 117, 139 
Z, 117 
>, 117, 139 
~, 117 
c, 119, 139 
¢, 119 
g, 119 
D, 139 
n + 1 (the successor of n € @, 1.e., 
nU {n}), 242 
a+, 6, 470 
sp, 309 


sup(B), 346 

sup* (A), 347 

p79 

Term(9Jt), 55 

St (the set of all nonempty strings 
over the set $), 224 

T C(A) (transitive closure of the 
class A), 265 

< (order of “definable sets’), 219, 
223, 225 

(...), 20 

Uy, 144 

Un (the class of all sets and atoms), 
144 


U, 144 

S(a) t (S is undefined at a), 
198 

Uj, Ai, 142 

Useien Aj, 142 

User Aa, 193 

U, 141 


UA (union of a class A), 150 
L) A (union of a set A), 152 
U(x) (“x is an urelement’”), 114 
Vy (a), 359 

V = L, 408 

V = L, 408 

Gn, 21 

a, 21 

Xn, 34 

Vu, 144 

Vy (the class of all sets), 145 
V, 145 

WF y, 359 

WEF(ON), 55 

ZF, 108, 228 

ZFC, 2, 108, 229 


absolute formula, 382 
absolute term, 382 
absolutely provable, 38 
absoluteness of rank, 394 
AC, 215, 274 
global, 226 
strong, 226 
Aczel, P., 498 
addition, 246 
over w, 246 
is associative, 248 
is commutative, 247 
adjacency map, 269 
adjacency matrix, 269 
V-introduction, 44 
alephs, 465 
algebraic topology, 275 
algorithm, 283 
alphabet, 7, 13 
ambiguity, 25 
ambiguous, 25, 498 
antichain, 520 
antinomies, 104 
Apostol, T., 228 
argot, 40 
arithmetization, 90 
arity, 9 
array, 233 
ascending sequence, 348 
associativity, 17, 256 
at most enumerable, 442 
atom, see urelement 
axiom, 8 
comprehension, 121 
logical, 8 
nonlogical, 8, 37 
of regularity, 159 


urelements are atomic, 116 
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of (unordered) pair, 145, 211 
of choice, see AC 
of collection, 163, 204 
of constructibility, 408 
of dependent choices, 298 
of foundation, 158 
of infinity, 234 
of power set, 180 
of replacement, 161, 163, 371 
of union, 151 
special, 8 
axiomatic theories, 39 
axiomatized, 39 
axioms, logical, 35 


Barwise, J., 192, 368, 395 
basic language, 108 
Bernays, P., 34, 42, 315 
Berry’s paradox, 104 
beth, 516 

bijection, see 1-1 correspondence 
binary digit, 455 

binary notation, 455 

binary relation, see relation 
bit, 455 

Blum, E., 228 

Blum’s speedup theorem, 228 
BNF, 18 

Boolean, 9 

Boolean addition, 270 
Boolean sum, 271 

bound variable, 19 

Bourbaki, N., 5, 43, 229 
Brouwer, L. E. J., 4, 12 
Burali-Forti, C., 122 


calculational proof, 85 
Cantor, G., 105, 122 
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cardinal, 207, 458, 467 concatenation, 13, 252 
inaccessible, 483 conditions, 520 
strongly inaccessible, 483 congruence, 261 
weakly inaccessible, 483 conjunction, 17 
cardinal multiplication, 472 conjunctionally, 85 
cardinal number, 431, 458, 467 connectives, 9 
cardinal product, 472 conservative, 68, 74 
cardinal successor, 465 conservative extension, 72, 168 
cardinality, 431, 458, 467 consistency, 61, 531 
cardinals consistency theorem, 64, 65 
product of family, 485 constant, 10 
sum of, 470 imported, 55 
sum of family, 484 constructible object, 397 
Cartesian product, 189 constructible sets, 396 
of many classes, 192 constructible universe, 219, 229, 284, 395 
category theory, 275 construction formative, 43 
CiC:65,.093 continuum hypothesis, 3, 466 
CH, 466 correct, 88 
chain, 354, 520 correct theory, 88 
choice function, 216 countable, 64, 90, 314, 442 
class, 136 countable chain condition, 553 
€-closed, 237 course-of-values induction, 281, 294 
domain of, 195 CTM, 314 
f-closed, 434 cumulative hierarchy, 359 
proper, 137 
R-closed, 495 De Morgan’s laws 
range of, 195 for classes, 154 
transitive, 236 generalized, 212 
class term, 134 decision problem, 41 
closed formula, 20 Dedekind, R., 449 
closed term, 20 Dedekind finite, 449 
closed under, 21, 434 Dedekind infinite, 449 
closure, 21 deduce, 38 
see also relation deduction theorem, 49 
cofinal subsets, 478 deductions, 1 
cofinality, 478 definability, 61, 62 
Cohen, P. J., 229, 467, 487, 526 in a structure, 61 
Cohen extension, 524, 526 definition by recursion, 26 
Cohen generic real, 553 Ao-formulas, 380 
collapsing cardinals, 551 A-system, 555 
collapsing function, 310 root of, 555 
collection, 149 dense, 520 
collection axiom, see axiom dense below, 559 
commutative diagram, 275 denumerable, 442 
compactness theorem, 65 derivation, 22 
comparable, 520 derived rule, 37, 39, 41 
compatible, 520 Devlin, K. J., 229 
complement, 145 diagonalization, 121, 156 
complete, 97 difference, 250 
complete arithmetic, 40 of two classes, 141 
complete lattice, 349 over w, 250 
completeness, 6 disjoint classes, 141 
completeness theorem, 54, 64, 65, 97 disjoint union, 412 
completion, 97 distributive laws, 95, 212 
composition, 195 for w, 282 
see also relation for x, 213 


computer programming, 232 generalized, 212 
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domain, 54 formalize, 6 
double induction, 248, 300 formalized, 3 
dummy renaming, 46 formula 
absolute, 382 
4-introduction, 37 mixed-mode, 62 
elimination of defined symbols, 69, 71 prime, 29 
embedding, 318 propositional, 29 
empty set, 100, 111, 131 satisfiable, 31 
Entscheidungsproblem, 41 tautology, 31 
enumerable, 442 unsatisfiable, 31 
e-numbers, 426 formula form, 35 
e-term, 76 formula schema, 35 
equinumerous, 431 foundation axiom, 102 
equipotent, 431 Fraenkel, A. A., 105 
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